per 



WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 4 ; 

C07K 13/00, C12P 21/00 
C12N 15/00, C07H 15/12 



Al 



(11) International Pnblicati n Number: WO 88/ 09344 

(43) International Publication Date: 1 December 1988 (01.1188) 



(21) International Application Number: PCT/US88/01737 

(22) International Filing Date: 19 May 1988 (19.05.88) 



(31) Priority Application Number: 

(32) Priority Date: 

(33) Priority Country : 



(60) Parent Application or Grant 

(63) Related by Continuation 
US 

Filed on 



052,800 

21 May 1987(21.05.87) 
US 



052,800 (CIP) 
21 May 1987(21.05.87) 



(71) Applicant (for all designated States except US): CREA- 
TIVE BIOMOLECULES, INC. [US/US]; 35 South 
Street, Hopkinton, MA 01748 (US). 



(72) Inventors; and 

(75) Inventors/ Applicants (for US only) : HUSTON, James, 
S. [US/US]; 41 Whittemore Road, Newton, MA 
02158 (US). OPPERMANN, Hermann [US/US]; 25 
Summernill Road, Medway, MA 02053 (US). 

(74) Agent: PITCHER, Edmund, R.; Lahive & Cockfield, 
60 State Street, Boston, MA 02109 (US). 



(81) Designated States: AT (European patent), AU, BE (Eu- 
ropean patent), CH (European patent), DE (Euro- 
pean patent), DK, FI, FR (European patent), GB 
(European patent), IT (European patent), JP, LU 
(European patent), NL (European patent), NO, SE 
(European patent), US. 



Published 

With international search report 



(54) Title: TARGETED MULTIFUNCTIONAL PROTEINS 



(57) Abstract 



Disclosed are a family of synthetic proteins having binding affinity for a preselected antigen, and multifunctional 
proteins having such affinity. The proteins are characterized by one or more sequences of amino acids constituting a re- 
gion which behaves as a biosynthetic antibody binding site (BABS). The sites comprise Vr-Vl or V L -V|j-like single chains 
wherein the V H and V L -like sequences are attached by a polypeptide linker, or individual V H or V L -like domains. The 
binding domains comprise linked CDR and FR regions, which may be derived from separate immunoglobulins. The pro- 
teins may also include other polypeptide sequences which function, eg., as an enzyme, toxin, binding site, or site for at- 
tachment to an immobilization media or radioactive atom. Methods are disclosed for producing the proteins, for designing 
BABS having any specificity that can be elicited by in vivo generation of antibody, for producing analogs thereof and for 
producing multifunctional synthetic proteins which are self-targeted by virtue of their binding site region. 
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TARGETED MULTIFUNCTIONAL PROTEINS 

The United States Government has rights in 
this application pursuant to small business 
innovation research grant numbers SSS-4 R43 
CA39870-01 and SSS-4 2 R44 CA39870-02. 

Reference to Related Applications 

This application is a continuation-in-part 
of copending U.S. application serial number 052,800 
filed May 21, 1987, the disclosure of which is 
incorporated herein by reference. 

Background of the Invention 

This invention relates to novel compositions 
of matter, hereinafter called targeted 
multifunctional proteins, useful, for example, in 
specific binding assays, affinity purification, 
biocatalysis, drug targeting, imaging, immunological 
treatment of various oncogenic and infectious 
diseases, and in other contexts • More specifically, 
this invention relates to biosynthetic proteins 
expressed from recombinant DNA as a single 
polypeptide chain comprising plural regions, one of 
which has a structure similar to an antibody binding 
site, and an affinity for a preselected antigenic 
determinant, and another of which has a separate 
function, and may be biologically active, designed to 
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bind to ions, or designed to facilitate 
immobilization of the protein. This invention also 
relates to the binding proteins per se, and methods 
for their construction. 

There are five classes of human antibodies. * 
Each has the same basic structure (see Figure 1) , or 
multiple thereof, consisting of two identical * 
polypeptides called heavy (H) chains (molecularly 
weight approximately 50,000 d) and two identical 
light (L) chains (molecular weight approximately 
25,000 d). Each of the five antibody classes has a 
similar set of light chains and a distinct set of 
heavy chains. A light chain is composed of one 
variable and one constant domain, while a. heavy chain 
is composed of one variable and three or more 
constant domains. The combined variable domains of a 
paired light and heavy chain are known as the Fv 
region, or simply "Fv". The Fv determines the 
specificity of the immunoglobulin, the constant 
regions have other functions. 

Amino acid sequence data indicate that each 
variable domain comprises three hypervariable regions 
or loops, sometimes called complementarity 
determining regions or "CDRs" flanked by four 
relatively conserved framework regions or "FRs" 
(Kabat et. al. , Sequences of Proteins pf 
Immunological Infr&r^gj- [U.S. Department of Health and 
Human Services, third edition, 1983, fourth edition, * 
1987]). The hypervariable regions have been assumed 
to be responsible for the binding specificity of * 
individual antibodies and to account for the 
diversity of binding of antibodies as a protein class. 
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Monoclonal antibodies have b en us d both as 
diagnostic and therapeutic agents. They are 
routinely produced according to established 
procedures by hybridomas generated by fusion of mouse 
lymphoid cells with an appropriate mouse myeloma cell 
line. 

The literature contains a host of references 
to the concept of targeting bioactive substances such 
as drugs, toxins , and enzymes to specific points in 
the body to destroy or locate malignant cells or to 
induce a localized drug or enzymatic effect* It has 
been proposed to achieve this effect by conjugating 
the bioactive substance to monoclonal antibodies 
(see, e.g., Vogel, Immunoconiuaates . Antibody 
Conjugates in Radioimacrina and Therapy of Cancer . 
1987, N.Y., Oxford University Press; and Ghose et al. 
(1978) J. Natl. Cancer Inst. £1:657-676, )• However, 
non-human antibodies induce an immune response when 
injected into humans. Human monoclonal antibodies 
may alleviate this problem, but they are difficult to 
produce by cell fusion techniques since, among other 
problems, human hybridomas are notably unstable, and 
removal of immunized spleen cells from humans is not 
feasible. 

Chimeric antibodies composed of human and 
non-human amino acid sequences potentially have 
improved therapeutic value as they presumably would 
elicit less circulating human antibody against the 
non-human immunoglobulin sequences. Accordingly, 
hybrid antibody molecules have been proposed which 
consist of amino acid sequences from different 
mammalian sources. The chimeric antibodies designed 
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thus far comprise variable regions from one mammalian 
source, and constant regions from human or another 
mammalian source (Morrison et al. (1984) Proc Natl. 
Acad* Sci. U.S.A., £1:5851-6855; Neuberger et al. 
(1984) Nature 312:604-608; Sahagan et al. (1986) J. 
Immunol. 132:1066-1074; EPO application nos. 
843 023 68. 0, Genentech; 85 1026 65. 8 , Research 
Development Corporation of Japan; 85305604.2, 
Stanford; P.C.T. application no. PCT/GB85/00392, 
Celltech Limited). 

It has been reported that binding function 
is localized to the variable domains of the antibody 
molecule located at the amino terminal end of both 
the heavy and light chains. The variable regions 
remain noncovalently associated (as V..V- dimers, 
termed Fv regions > even after proteolytic cleavage 
from the native antibody molecule, and retain much of 
their antigen recognition and binding capabilities 
(see, for example, Inbar et al., Proc. Natl. Acad. 
Sci. U.S.A. (1972)- £2.:2659-2662; Hochman et. al. 
(1973) Biochem. 11:1130-1135; and (1976) Biochem. 
15.: 2706-2710; Sharon and Givol (1976) Biochem. 
15.: 1591-1594; Rosenblatt and Haber (1978) Biochem. 
17:3877-3882; Ehrlich et al, (1980) Biochem. 
12:4091-40996). Methods of manufacturing two-chain 
Fv substantially free- of constant region using 
recombinant DNA techniques are disclosed in U.S. 
4,642,334 and corresponding published specification 
EP 088,994. 



WO 88/09344 



PCT7O5S8/01737 



- 5 - 

Summary of the Invention 

In one aspect the invention provides a 
single chain multifunctional biosynthetic protein 
expressed from a single gene derived by recombinant 
DNA techniques. The protein comprises a biosynthetic 
antibody binding site (BABS) comprising at least one 
protein domain capable of binding to a preselected 
antigenic determinant. The amino acid sequence of 
the domain is homologous to at least a portion of the 
sequence of a variable region of an immunoglobulin 
molecule capable of binding the preselected antigenic 
determinant. Peptide bonded to the binding site is a 
polypeptide consisting of an effector protein having 
a conformation suitable for biological activity in a 
mammal, an amino acid sequence capable of 
sequestering ions, or an amino acid sequence capable 
of selective binding to a solid support* 

In another aspect, the invention provides 
biosynthetic binding site protein comprising a single 
polypeptide chain defining two polypeptide domains 
connected by a polypeptide linker. The amino acid 
sequence of each of the domains comprises a set of 
complementarity determining regions (CDRs) interposed 
between a set of framework regions (FRs), each of 
which is respectively homologous with at least a 
portion of the CDRs and FRS from an immunoglobulin 
molecule. At least one of the domains comprises a 
set of CDR amino acid sequences and a set of FR amino 
acid sequences at least partly homologous to 
different immunoglobulins. The two polypeptide 
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domains together define a hybrid synthetic binding 
site having specificity for a preselected antigen, 
determined by the selected CDRs. 

In still another aspect, the invention 
provides biosynthetic binding protein comprising a 
single polypeptide chain defining two domains 
connected by a polypeptide linker* The amino acid 
sequence of each of the domains comprises a set of 
CDRs interposed between a set of FRs, each of which 
is respectively homologous with at least a portion of 
the CDRs and FRs from an immunoglobulin molecule. 
The linker comprises plural, peptide-bonded amino 
acids defining a polypeptide of a length sufficient 
to span the distance between the C terminal end of 
one of the domains and N terminal end of the other 
when the binding protein assumes a conformation 
suitable for binding. The. linker comprises 
hydrophilic amino acids which together preferably 
constitute a hydrophilic sequence. Linkers which 
assume an unstructured polypeptide configuration in 
aqueous solution work well. The binding protein is 
capable of binding to a preselected antigenic site, 
determined by the collective tertiary structure of 
the sets of CDRs held in proper conformation by the 
sets of FRs. Preferably, the binding protein has a 
specificity at least substantially identical to the 
binding specificity of the immunoglobulin molecule 
used as a template for the design of the CDR 
regions. Such structures can have a binding affinity 
of at least 10 6 , M" 1 , and preferably 10 8 M" 1 . 

In preferred aspects, the FRs of the binding 
protein are homologous to at least a portion of the 
FRs from a human immunoglobulin, the linker spans at 
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least about 40 angstroms; a polypeptid spacer is 
incorporated in the multifunctional protein between 
the binding site and the second polypeptide; and the 
binding protein has an affinity for the preselected 
antigenic determinant no less than two orders of 
magnitude less than the binding affinity of the 

% immunoglobulin molecule used as a template for the 

CDR regions of the binding protein. The preferred 
linkers and spacers are cysteine-f ree. The linker 
preferably comprises amino acids having unreactive 
side groups, e.g., alanine and glycine. Linkers and 
spacers can be made by combining plural consecutive 
copies of an amino acid sequence, e.g./ (Gly^—^ 
"Ser>3> The invention also provides DNAs encoding 
these proteins and host cells harboring and capable 
of expressing these DNAs. 

As used herein, the phrase biosynthetic 
antibody binding site or BABS means synthetic 
proteins expressed from DNA derived by recombinant 
techniques. BABS comprise biosynthetically produced 
sequences of amino acids defining polypeptides 
designed to bind with a preselected antigenic 
material. The structure of these synthetic 
polypeptides is unlike that of naturally occurring 
antibodies, fragments thereof, e.g., Fv, or known 
synthetic polypeptides or "chimeric antibodies" in 
that the regions of the BABS responsible for 
specificity and affinity of binding, (analogous to 
native antibody variable regions) are linked by 

* peptide bonds, expressed from a single DNA, and may 

themselves be chimeric, e.g., may comprise amino acid 
sequences homologous to portions of at least two 
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differ nt antibody molecules. Th BABS embodying the 
invention are biosynthetic in the sense that they are 
synthesized in a cellular host made to express a 
synthetic DNA, that is, a recombinant DNA made by 
ligation of plural/ chemically synthesized 
oligonucleotides, or by ligation of fragments of DNA 
derived from the genome of a hybridoma, mature B cell 
clone, or a cDNA library derived from such natural 
sources. The proteins of the invention are properly 
characterized as "binding sites" in that these 
synthetic molecules are designed to have specific 
affinity for a preselected antigenic determinant. 
The polypeptides of the invention comprise structures 
patterned after regions of native antibodies known to 
be responsible for antigen recognition. 

Accordingly, it is an object of the 
invention to provide novel multifunctional proteins 
comprising one or more effector proteins and one or 
more biosynthetic antibody binding sites, and to 
provide DNA sequences which encode the proteins. 
Another object is to provide a generalized method for 
producing biosynthetic antibody binding site 
polypeptides of any desired specificity. 
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Brief Description of the Drawing 

The foregoing and other objects of this 
invention, the various features thereof, as well as 
the invention itself, may be more fully understood 
from the following description, when read together 
with the accompanying drawings. 

Figure 1A is a schematic representation of 
an intact IgG antibody molecule containing two light 
chains, each consisting of one variable and one 
constant domain, and two heavy chains, each 
consisting of one variable and three constant 
domains. Figure IB is a schematic drawing of the 
structure of Fv proteins (and DNA encoding them) 
illustrating V H and V L domains, each of which 
comprises four framework (FR) regions and three 
complementarity determining (CDR) regions. Boundaries 
of CDRs are indicated, by way of example, for 
monoclonal 26-10, a well known and characterized 
murine monoclonal specific for digozin. 

Figure 2A-2E are schematic representations 
of some of the classes of reagents constructed in 
accordance with the invention, each of which 
comprises a biosynthetic antibody binding site. 

Figure 3 discloses five amino acid sequences 
(heavy chains) in single letter code lined up 
vertically to facilitate understanding of the 
invention. Sequence 1 is the known native sequence 
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of V H from murine monoclonal glp-4 
(anti-lysozyme) . Sequence 2 is the known native 
sequence of V H from murine monoclonal 26-10 
(anti-digoxin) . Sequence 3 is a BABS comprising the 
FRs from 26-10 V H and the CDRs from glp-4 V H . 
The CDRs are identified in lower case letters; 
restriction sites in the DNA used to produce chimeric 
sequence 3 are also identified. Sequence 4 is the 
known native sequence of V H from human myeloma 
antibody NEWEL Sequence 5 is a BABS comprising the 
FRs from NEWM V fi and the CDRs from glp-4 V R , 
i.e., illustrates a "humanized" binding site having a 
human framework but an affinity for lysozyme similar 
to murine glp-4. 

Figures 4A-4F are the synthetic nucleic acid 
sequences and encoded amino acid sequences of (4A) 
the heavy chain variable domain of murine 
anti-digoxin monoclonal 26-10; (4B) the light chain 
variable domain of. murine anti-digoxin monoclonal 
26-10; (4C) a heavy chain variable domain of a BABS 
comprising CDRs of glp-4 and FRs of 26-10; (4D) a 
light chain variable region of the same BABS; (4E) a 
heavy chain variable region of a BABS comprising CDRs 
of glp-4 and FRs of NEWM; and (4F) a light chain 
variable region comprising CDRs of glp-4 and FRs of 
NEWM. Delineated are FRs, CDRs, and restriction 
sites for endonuclease digestion, most of which were 
introduced during design of the DNA. 
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Figure 5 is the nucleic acid and encoded 
amino acid sequence of a host DNA <V H ) designed to 
facilitate insertion of CDRs of choice* The DNA was 
designed to have unique 6-base sites directly 
flanking the CDRs so that relatively small 
oligonucleotides defining portions of CDRs can be 
readily inserted, and to have other sites to 
facilitate manipulation of the DNA to optimize 
binding properties in a given construct. The 
framework regions of the molecule correspond to 
murine FRs (Figure 4A> . 

Figures 6A and 6B are multifunctional 
proteins (and DNA encoding them) comprising a single 
chain BABS with the specificity of murine monoclonal 
26-10 , linked through a spacer to the FB fragment of 
protein A, here fused as a leader, and constituting a 
binding site for Fc. The spacer comprises the 11 
C-terminal amino acids of the FB followed by Asp-Pro 
(a dilute acid cleavage site). The single chain BABS 
comprises sequences mimicking the V„ and V T (6A) 

tt Li 

and the V L and V H (6B) of murine monoclonal 
26-10. The V L in construct 6A is altered at 
residue 4 where valine replaces methionine present in 
the parent 26-10 sequence. These constructs contain 
binding sites for both Fc and digoxin. Their 
structure may be summarized as; 

(6A) FB-Asp-Pro-V H -(Gly 4 -Ser) 3 -V L/ 

and 

(6B) FB-Asp-Pro-V L -(Gly 4 -Ser) 3 -V H , 
where (Gly.-Ser)- is a polypeptide linker. 
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In Figures 4A-4E and 6A and 6B, th amino 
acid sequence of the expression products start after 
the GAATTC sequences/ which codes for an EcoRI splice 
site/ translated as Glu-Phe on the drawings. 

s 

Figure 7A is a graph of percent of maximum 

counts bound of radioiodinated digoxin versus * 

concentration of binding protein adsorbed to the 

plate comparing the binding of native 26-10 (curve 1> 

and the construct of Figure 6A and Figure 2B 

renatured using two different procedures (curves 2 

and 3) . Figure 7B is a graph demonstrating the 

bifunctionality of the FB-(26-10) BABS adhered to 

microtiter plates through the specific binding of the 

binding site to the digoxin-BSA coat on the plate. 

Figure 7B shows the percent inhibition of 
125 

I-rabbit-IgG binding to the FB domain of the FB 
BABS by the addition of IgG r protein A # FB, murine 
IgG2a, and murine IgGl. 

Figure 8 is a schematic representation of a 
model assembled DHA sequence encoding a 
multifunctional biosynthetic protein comprising a 
leader peptide (used to aid expression and thereafter 
cleaved), a binding site, a spacer, and an effector 
molecule attached as a trailer sequence. 

Figure 9A-9E are exemplary synthetic nucleic * 
acid sequences and corresponding encoded amino acid 
sequences of binding sites of different * 
specificities: (A) FRs from NEWM and CDRs from 26-10 
having the digoxin specificity of murine monoclonal 
26-10; (B) FRs from 26-10, and CDRs from G-loop-4 



WO 88/09344 



PCT/US88/01737 



- 13 - 



(glp-4) having lysozyme specificity; (C) FRs and CDRs 
from MOPC-315 having dinitrophenol (DNF) specificity; 
(D) FRs and CDRs from an anti-CEA monoclonal 
antibody; <E) FRs in both V R and V L and CDR 1 
and CDR 3 in V H , and CDR 1# CDR 2 , and CDR 3 in 
V L from an anti-CEA monoclonal antibody; CDR^ in 
V H is a CDR 2 consensus sequence found in most 
immunoglobulin V H regions. 

Figure 10A is a schematic representation of 
the DNA and amino acid sequence of a leader peptide 
(MLE) protein with corresponding DNA sequence and 
some major restriction sites. Figure 10B shows the 
design of an expression plasmid used to express 
MLE-BABS (26-10). During construction of the gene, 
fusion partners were joined at the EcoRl site that is 
shown as part of the leader sequence. The pBR322 
plasmid, opened at the unique Sspl and PstI sites, 
was combined in a 3-part ligation with an Sspl to 
EcoRI fragment bearing the trp promoter and MLE 
leader and with an EcoRI to PstI fragment carrying 
the BABS gene. The resulting expression vector 
confers tetracycline resistance on positive 
transf ormants . 

Figure 11 is an SDS-polyacrylamide gel (15%) 
of the (26-10) BABS at progressive stages of 
purification. Lane 0 shows low molecular weight 
standards; lane 1 is the MLE-BABS fusion protein; 
lane 2 is an acid digest of this material; lane 3 is 
the pooled DE-52 chromatographed protein; lanes 4 and 
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5 are the same oubain-Sepharose pool of single chain 
BABS except that lane 4 protein is reduced and lane 5 
protein is unreduced. 

Figure 12 shows inhibition curves for 26-10 
BABS and 26-10 Fab species,, and indicates the 
relative affinities of the antibody fragment for the 
indicated cardiac glycosides. 

Figures 13A and 13B are plots of digoxin 
binding curves. (A) shows 26-10 BABS binding 
isotherm and Sips plot <inset), and (B) shows 26-10 
Fab binding isotherm and Sips plot (inset). 

Figure 14 is a nucleic acid sequence and 
corresponding amino acid sequence of a modified FB 
dimer leader sequence and various restriction sites. 

Figure 15A-15H are nucleic acid sequences 
and corresponding .amino acid sequences of 
biosynthetic multifunctional proteins including a 
single chain BABS and various biologically active 
protein trailers linked via a spacer sequence. Also 
indicated are various endonuclease digestion sites. 
The trailing sequences are (A) epidermal growth 
factor (EGF); (B) streptavidin; (C) tumor necrosis 
factor (TNF); (D) calmodulin; (E) platelet derived 
growth factor-beta (PDGF-beta); (F) ricin; and (G) 
inter leukin^2, and (H) an FB-FB dimer. 
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Description 

The invention will first be described in its 
broadest overall aspects with a more detailed 
description following, 

A class of novel biosynthetic, bi or 
multifunctional proteins has now been designed and 
engineered which comprise biosynthetic antibody 
binding sites, that is, *BABS M or biosynthetic 
polypeptides defining structure capable of selective 
antigen recognition and preferential antigen binding, 
and one or more peptide-bonded additional protein or 
polypeptide regions designed to have a preselected 
property. Examples of the second region include 
amino acid sequences designed to sequester ions, 
which makes the protein suitable for use as an 
imaging agent, and sequences designed to facilitate 
immobilization of the protein for use in affinity 
chromatography and solid phase immunoassay. Another 
example of the second region is a bioactive effector 
molecule, that is, a protein having a conformation 
suitable for biological activity, such as an enzyme, 
toxin, receptor, binding site, growth factor, cell 
differentiation factor, lymphokine, cytokine, 
hormone, or anti-metabolite. This invention features 
synthetic, multifunctional proteins comprising these 
regions peptide bonded to one or more biosynthetic 
antibody binding sites, synthetic, single chain 
proteins designed to bind preselected antigenic 
determinants with high affinity and specificity, 
constructs containing multiple binding sites linked 



WO 88/09344 



PCT/US88/01737 



- 16 - 



together to provide multipoint antigen binding and 

high net affinity and specificity/ DNA encoding these 

proteins prepared by recombinant techniques, host 

cells harboring these DNAs, and methods for the 

production of these proteins and DNAs. a 

The invention requires recombinant 
production of single chain binding sites having * 
affinity and specificity for a predetermined 
antigenic determinant. This technology has been 
developed and is disclosed herein. In view of this 
disclosure, persons skilled in recombinant DNA 
technology, protein design, and protein chemistry can 
produce such sites which, when disposed in solution, 
have high binding constants (at least 10 6 , 
preferably 10 8 RT" 1 >) and excellent specificity. 

The design of the BABS is based on the 
observation that three subregions of the variable 
domain of each of the heavy and light chains of 
native immunoglobulin molecules collectively are 
responsible for antigen recognition and binding. 
Each of these subregions, called herein 
"complementarity determining regions" or CDRs, 
consists of one of the hypervariable regions or loops 
and of selected amino acids or amino acid sequences 
disposed in the framework regions or FRs which flank 
that particular hypervariable region. It has now 
been discovered that FRs from diverse species are 
effective to maintin CDRs from diverse other species * 
in proper conformation so as to achieve true 

immunochemical binding properties in a biosynthetic * 
protein. It has also been discovered that 
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biosynthetic domains mimicking the structure of the 
two chains of an immunoglobulin binding site may be 
connected by a polypeptide linker while closely 
approaching , retaining, and often improving their 
collective binding properties • 

The binding site region of the 
multifunctional proteins comprises at least one, and 
preferably two domains, each of which has an amino 
acid sequence homologous to portions of the CDRs of 
the variable domain of an immunoglobulin light or 
heavy chain, and other sequence homologous to the FRs 
of the variable domain of the same, or a second, 
different immunoglobulin light or heavy chain. The 
two domain binding site construct also includes a 
polypeptide linking the domains. Polypeptides so 
constructed bind a specific preselected antigen 
determined by the CDRs held in proper conformation by 
the FRs and the linker. Preferred structures have 
human FRs, i.e., mimic the amino acid sequence of at 
least a portion of the framework regions of a human 
immunoglobulin, and have linked domains which 
together comprise structure mimicking a V„-v_ or 

£1 li 

V L -V H immunoglobulin two-chain binding site. CDR 
regions of a mammalian immunoglobulin, such as those 
of mouse, rat, or human origin are preferred. In one 
preferred embodiment, the biosynthetic antibody 
binding site comprises FRs homologous with a portion 
of the FRs of a human immunoglobulin and CDRs 
homologous with CDRs from a mouse or rat 
immunoglobulin. This type of chimeric polypeptide 
displays the antigen binding specificity of the mouse 
or rat immunoglobulin, while its human framework 
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minimizes human, immune reactions. In addition, the 
chimeric polypeptide may comprise other amino acid 
sequences. It may comprise, for example, a sequence 
homologous to a portion of the constant domain of an 
immunoglobulin, but preferably is free of constant ? 
regions (other than FRs) . ^ 

The binding site region(s) of the chimeric * 
proteins are thus single chain composite polypeptides 
comprising a structure which in solution behaves like 
an antibody binding site. The two domain, single 
chain composite polypeptide has a structure patterned 
after tandem V H and V L domains, but with the 
carboxyl terminal of one attached through a linking 
amino acid sequence to the amino terminal of the 
other. The linking amino acid sequence may or may 
not itself be antigenic or biologically active. It 
preferably spans a distance of at least about 40A, 
i.e., comprises at least about 14 amino acids, and 
comprises residues which together present a 
hydrophi lie, relatively unstructured region. Linking 
amino acid sequences having little or no secondary 
structure work well. Optionally, one or a pair of 
unique amino acids or amino acid sequences 
recognizable by a site specific cleavage agent may be 
included in the linker. This permits the V H and 
v L ~like domains to be separated after expression, 
or the linker to be excised after refolding of the 
binding site. « 

Either the amino or carboxyl terminal ends 
(or both ends) of these chimeric, single chain * 
binding sites are attached to an amino acid sequence 
which itself is bioactive or has some other function 
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to produce a bifunctional or multifunctional 
protein. For example, the synthetic binding site may 
include a leader and/or trailer sequence defining a 
polypeptide having enzymatic activity, independent 
affinity for an antigen different from the antigen to 
which the binding site is directed, or having other 
functions such as to provide a convenient site of 
attachment for a radioactive ion, or to provide a 
residue designed to link chemically to a solid 
support. This fused, independently functional 
section of protein should be distinguished from fused 
leaders used simply to enhance expression in 
prokaryotic host cells or yeasts. The 
multifunctional proteins also should be distinguished 
from the "conjugates" disclosed in the prior art 
comprising antibodies which, after expression, are 
linked chemically to a second moiety. 

Often, a series of amino acids designed as a 
"spacer" is interposed between the active regions of 
the multifunctional protein. Use of such a spacer 
can promote independent refolding of the regions of 
the protein. The spacer also may include a specific 
sequence of amino acids recognized by an 
endopeptidase, for example, endogenous to a target 
cell (e.g., one having a surface protein recognized 
by the binding site) so that the bioactive effector 
protein is cleaved and released at the target. The 
second functional protein preferably is present as a 
trailer sequence, as trailers exhibit less of a 
tendency to interfere with the binding behavior of 
the BABS. 
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The therapeutic use of such "self -targeted" 
bioactive proteins offers a number of advantages over 
conjugates of immunoglobulin fragments or complete 
antibody molecules: they are stable, less 
immunogenic and have a lower molecular weight; they 
can penetrate body tissues more rapidly for purposes 
of imaging or drug delivery because of their smaller 
size; and they can facilitate accelerated clearance 
of targeted isotopes or drugs. Furthermore, because 
design of such structures at the DNA level as 
disclosed herein permits ready selection of 
bioproperties and specificities, an essentially 
limitless combination of binding sites and bioactive 
proteins is possible, each of which can be refined as 
disclosed herein to optimize independent activity at 
each region of the synthetic protein. The synthetic 
proteins can be expressed in procaryotes such as E. 
fifili, and thus are less costly to produce than 
immunoglobulins or fragments thereof which require 
expression in cultured animal cell lines. 

The invention thus provides a family of 
recombinant proteins expressed from a single piece of 
DNA, all of which have the capacity to bind 
specifically with a predetermined antigenic 
determinant. The preferred species of the proteins 
comprise a second domain which functions 
independently of the binding region. In this aspect 
the invention provides an array of "self -targeted" 
proteins which have a bioactive function and which 
deliver that function to a locus determined by the 
binding. site's specificity. It also provides 
biosynthetic binding proteins having attached 
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polypeptides suitable for attachment to 
immobilization matrices which may be used in affinity 
chromatography and solid phase immunoassay 
applications/ or suitable for attachment to ions, 
e.g., radioactive ions, which may be used for in vivo 
imaging* 

The successful design and manufacture of the 
proteins of the invention depends on the ability to 
produce biosynthetic binding sites, and most 
preferably, sites comprising two domains mimicking 
the variable domains of immunoglobulin connected by a 
linker. 

As is now well known, Fv, the minimum 
antibody fragment which contains a complete antigen 
recognition and binding site, consists of a dimer of 
one heavy and one light chain variable domain in 
noncovalent association (Figure 1A) . It is in this 
configuration that the three complementarity 
determining regions of each variable domain interact 
to define an antigen binding site on the surface of 
the V R -V L dimer. Collectively, the six 
complementarity determining regions (see Figure IB) 
confer antigen binding specificity to the antibody. 
FRs flanking the CDRs have a tertiary structure which 
is essentially conserved in native immunoglobulins of 
species as diverse as human and mouse. These FRs 
i serve to hold the CDRs in their appropriate 

orientation. The constant domains are not required 
for binding function, but may aid in stabilizing 
V H -V L interaction. Even a single variable domain 
(or half of an Fv comprising only three CDRs specific 
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for an antigen) has the ability to recogniz and bind 
antigen, although at a lower affinity than an entire 
binding site (Painter et al. (1972) Biochem. 
11:1327-1337) • 

This knowledge of the structure of 
immunoglobulin proteins has now been exploited to 
develop multifunctional fusion proteins comprising 
biosynthetic antibody binding sites and one or more 
other domains. 

The structure of these biosynthetic proteins 
in the region which impart the binding properties to 
the protein is analogous to the Fv region of a 
natural antibody. It comprises at least one, and 
preferably two domains consisting of amino acids 
defining V H and V^-like polypeptide segments 
connected by a linker which together form the 
tertiary molecular structure responsible for affinity 
and specif icity 0 Each domain comprises a set of 
amino acid sequences analogous to immunoglobulin CDRs 
held in appropriate conformation by a set of 
sequences analogous to the framework regions (FRs) of 
an Fv fragment of a natural antibody. 

The term CDR, as used herein, refers to 
amino acid sequences which together define the 
binding affinity and specif icity of the natural Fv 
region of a native immunoglobulin binding site, or a 
synthetic polypeptide which mimics this function. 
CDRs typically are not wholly homologous to 
hypervariable regions of natural Fvs, but rather also 
may include specific amino acids or amino acid 
sequences which flank the hypervariable region and 
have heretofore been considered framework not 
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directly deterrainitive of complementarity. The term 
FR, as used herein, refers to amino acid sequences 
flanking or interposed between CDRs. 

The CDR and FR polypeptide segments are 
designed based on sequence analysis of the Fv region 
of preexisting antibodies or of the DMA encoding 
them. In one embodiment , the amino acid sequences 
constituting the FR regions of the BABS are analogous 
to the FR sequences of a first preexisting antibody, 
for example, a human IgG. The amino acid sequences 
constituting the CDR regions are analogous to the 
sequences from a second, different preexisting 
antibody, for example, the CDRs of a murine IgG. 
Alternatively, the CDRs and FRs from a single 
preexisting antibody from, e.g., an unstable or hard 
to culture hybridoma, may be copied in their entirety. 

Practice of the invention enables the design 
and biosynthesis of various reagents, all of which 
are characterized by a region having affinity for a 
preselected antigenic determinant. The binding site 
and other regions of the bipsynthetic protein are 
designed with the particular planned utility of the 
protein in mind. Thus, if the reagent is designed 
for intravascular use in mammals, the FR regions may 
comprise amino acids similar or identical to at least 
a portion of the framework region amino acids of 
antibodies native to that mammalian species. On the 
other hand, the amino acids comprising the CDRs may 
be analogous to a portion of the amino acids from the 
hypervariable region (and certain flanking amino 
acids) of an antibody having a known affinity and 
specificity, e.g., a murine or rat monoclonal 
antibody. 
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Other sections of native immunoglobulin 
protein structure, e.g., C H and C^, need not be 
present and normally are intentionally omitted from 
the biosynthetic proteins. However, the proteins of 
the invention normally comprise additional 
polypeptide or protein regions defining a bioactive 
region, e.g., a toxin or enzyme, or a site onto which 
a toxin or a remotely detectable substance can be 
attached. 

The invention thus can provide intact 
biosynthetic antibody binding sites analogous to 
V H~ V L dimers, either non-covalently associated, 
disulfide bonded, or preferably linked by a 
polypeptide sequence to form a composite or 

H Li 

v lT V H polypeptide which may be essentially free 
of antibody constant region. The invention also 
provides proteins analogous to an independent V H or 
V L domain, or dimers thereof. Any of these 
proteins may be provided in a form linked to, for 
example, amino acids analogous or homologous to a 
bioactive molecule such as a hormone or toxin. 

Connecting the independently functional 
regions of the protein is a spacer comprising a short 
amino acid sequence whose function is to separate the 
functional regions so that they can^ independently 
assume their active tertiary conformation. The 
spacer can consist of an amino acid sequence present 
on the end of a functional protein which sequence is 
not itself required for its function, and/or specific 
sequences engineered into the protein at the DBA 
level. 
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The spacer gen rally may compris between 5 
and 25 residues. Its optimal length may be 
determined using constructs of different spacer 
lengths varying , for example, by units of 5 amino 
acids. The specific amino acids in the spacer can 
vary. Cysteines should be avoided. Hydrophilic 
amino acids are preferred. The spacer sequence may 
mimic the sequence of a hinge region of an 
immunoglobulin. It may also be designed to assume a 
structure, such as a helical structure. Proteolytic 
cleavage sites may be designed into the spacer 
separating the variable region-like sequences from 
other pendant sequences so as to facilitate cleavage 
of intact BABS, free of other protein, or so as to 
release the bioactive protein in vivo . 

Figures 2A-2E illustrate five examples of 
protein structures embodying the invention that can 
be produced by following the teaching disclosed 
herein. All are characterized by a biosynthetic 
polypeptide defining a binding site 3, comprising 
amino acid sequences comprising CDRs and FRs, often 
derived from different immunoglobulins, or sequences 
homologous to a portion of CDRs and FRs from 
different immunoglobulins. Figure 2A depicts a 
single chain construct comprising a polypeptide 
domain 10 having an amino acid sequence analogous to 
the variable region of an immunoglobulin heavy chain, 
bound through its carboxyl end to a polypeptide 
linker 12, which in turn is bound to a polypeptide 
domain 14 having an amino acid sequence analogous to 
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the variable region of an immunoglobulin light 
chain. Of course/ the light and heavy chain domains 
may be in reverse order. Alternatively, the binding 
site may comprise two substantially homologous amino 
acid sequences which are both analogous to the 
variable region of an immunogloSulin heavy or light 
chain. 

The linker 12 should be long enough (e.g., 
about 15 amino acids or about 40 A to permit the 
chains 10 and 14 to assume their proper 
conformation. The linker 12 may comprise an amino 
acid sequence homologous to a sequence identified as 
'self by the species into which it will be 
introduced, if drug use is intended. For example^ 
the linker may comprise an amino acid sequence 
patterned after a hinge region of an immunoglobulin. 
The linker preferably comprises hydrophilic amino 
acid sequences » It may also comprise a bioactive 
polypeptide such as a cell toxin which is to be 
targeted by the binding site, or a segment easily 
labelled by a radioactive reagent which is to be 
delivered, e.g., to the site of a tumor comprising an 
epitope recognized by the binding site. The linker 
may also include one or two built-in cleavage sites, 
i«e., an amino acid or amino acid sequence 
susceptible to attack by a site specific cleavage 
agent as described below. This strategy permits the 
V H and domains to be separated after 

expression, or the linker to be excised after folding 
while retaining the binding site structure in 
non-covalent association. The amino acids of the 
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linker preferably are selected from among those 
having relatively small, unreactive side chains . 
Alanine, serine, and glycine are preferred. 

Generally, the design of the linker involves 
considerations similar to the design of the spacer, 
excepting that binding properties of the linked 
domains are seriously degraded if the linker sequence 
is shorter than about 20A in length, i.e., comprises 
less than about 10 residues. Linkers longer than the 
approximate 40A distance between the N terminal of a 
native variable region and the C-terminal of its 
sister chain may be used, but also potentially can 
diminish the BABS binding properties. Linkers 
comprising between 12 and 18 residues are preferred. 
The preferred length in specific constructs may be 
determined by varying linker length first by units of 
5 residues, and second by units of 1-4 residues after 
determining the best multiple of the pentameric 
starting units. 

Additional proteins or polypeptides may be 
attached to either or both the amino or carboxyl 
termini of the binding site to produce 
multifunctional proteins of the type illustrated in 
Figures 2B-2E. As an example, in Figure 2B, a 
helically coiled polypeptide structure 16 comprises a 
protein A fragment (FB) linked to the amino terminal 
end of a V H «like domain 10 via a spacer 18. Figure 
2C illustrates a bifunctional protein having an 
effector polypeptide 20 linked via spacer 22 to the 
carboxyl terminus of polypeptide 14 of binding 
protein segment 2. This effector polypeptide 20 may 
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consist of, for exampl , a toxin, therapeutic drug, 
binding protein, enzyme or enzyme fragment, site of 
attachment for an imaging agent (e.g», to chelate a 
radioactive ion such as indium), or site of selective 
attachment to an immobilization matrix so that the 
BABS can be used in affinity chromatography or solid 
phase binding assay. This effector alternatively may 
be linked to the amino terminus of polypeptide 10, 
although trailers are preferred. Figure 2D depicts a 
trifunctional protein comprising a linked pair of 
BABS 2 having another distinct protein domain 20 
attached to the N-terminus of the first binding 
protein segment* Use of multiple BABS in a single 
protein enables production of constructs having very 
high selective affinity for multiepitopic sites such 
as cell surface proteins. 

The independently functional domains are 
attached by a spacer 18 (Figs 2B and 2D) covalently 
linking the C terminus of the protein 16 or 20 to the 
N-terminus of the -first domain 10 of the binding 
protein segment 2, or by a spacer 22 linking the 
C-terminus of the second binding domain 14 to the 
N-terminus of another protein (Figs . 2C and 2D) . The 
spacer may be an amino acid sequence analogous to 
linker sequence 12/ or it may take other forms. As 
noted above, the spacer's primary function is to 
separate the active protein regions to promote their 
independent bioactivity and permit each region to 
assume its bioactive conformation independent of 
interference from its neighboring structure. 
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Figure 2E depicts another type of reagent, 
comprising a BABS having only one set of three CDRs, 
e.g., analogous to a heavy chain variable region, 
which retains a measure of affinity for the antigen. 
Attached to the carboxyl end of the polypeptide 10 or 
14 comprising the FR and CDR sequences constituting 
the binding site 3 through spacer 22 is effector 
polypeptide 20 as described above. 

As is evidenced from the foregoing, the 
invention provides a large family of reagents 
comprising proteins,, at least a portion of which 
defines a binding site patterned after the variable 
region of an immunoglobulin. It will be apparent 
that the nature of any protein fragments linked to 
the BABS, and used for reagents embodying the 
invention, are essentially unlimited, the essence of 
the invention being the provision, either alone or 
linked to other proteins, of binding sites having 
specificities to any antigen desired. 

The clinical administration of 
multifunctional proteins comprising a BABS, or a BABS 
alone, affords a number of advantages over the use of 
intact natural or chimeric antibody molecules, 
fragments thereof, and conjugates comprising such 
antibodies linked chemically to a second bioactive 
moiety. The multifunctional proteins described 
herein offer fewer cleavage sites to circulating 
proteolytic enzymes, their functional domains are 
connected by peptide bonds to polypeptide linker or 
spacer sequences, and thus the proteins have improved 
stability. Because of their smaller size and 
efficient design, the multifunctional proteins 
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described herein reach their target tissu more 
rapidly, and are cleared more quickly from the body. 
They also have reduced immunogenicity. In addition, 
their design facilitates coupling to other moieties 
in drug targeting and imaging application. Such 
coupling may be conducted chemically after expression 
of the BABS to a site of attachment for the coupling 
product engineered into the protein at the DNA 
level. Active effector proteins having toxic, 
enzymatic, binding, modulating, cell differentiating, 
hormonal, or other bioactivity are expressed from a 
single DNA as a leader and/or trailer sequence, 
peptide bonded to the BABS. 

Design and Manufacture 

The proteins of the invention are designed 
at the DMA level. The chimeric or synthetic DNAs are 
then expressed in a suitable host system, and the 
expressed proteins are collected and renatured if 
necessary. A preferred general structure'of the DNA 
encoding the proteins is set forth in Figure 8. As 
illustrated, it encodes an optimal leader sequence 
used to promote expression in procaryotes having a 
built-in cleavage site recognizable by a site 
specific cleavage agent, for example, an 
endopeptidase, used to remove the leader after 
expression. This is followed by DNA encoding a 
V H -like domain, comprising CDRs and FRs, a linker, 
a V L -like domain, again comprising CDRs and FRs, a 
spacer, and an effector protein. After expression, 
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folding, and cleavage of the leader, a bifunctional 
protein is produced having a binding region whose 
specificity is determined by the CDRs, and a 
peptide-linked independently functional effector 
region. 

The ability to design the BABS of the 
invention depends on the ability to determine the 
sequence of the amino acids in the variable region of 
monoclonal antibodies of interest, or the DNA 
encoding them. Hybridoma technology enables 
production of cell lines secreting antibody to 
essentially any desired substance that produces an 
immune response. RNA encoding the light and heavy 
chains of the immunoglobulin can then be obtained 
from the cytoplasm of the hybridoma. The 5' end 
portion of the mRNA can be used to prepare cDNA for 
subsequent sequencing, or the amino acid sequence of 
the hypervariable and flanking framework regions can 
be determined by amino acid sequencing of the V 
region fragments of the H and L chains. Such 
sequence analysis is now conducted routinely. This 
knowledge, coupled with observations and deductions 
of the generalized structure of immunoglobulin Pvs, 
permits one to design synthetic genes encoding FR and 
CDR sequences which likely will bind the antigen. 
These synthetic genes are then prepared using known 
techniques, or using the technique disclosed below, 
inserted into a suitable host, and expressed, and the 
expressed protein is purified. Depending on the host 
cell, renaturation techniques may be required to 
attain proper conformation. The various proteins are 
then tested for binding ability, and one having 
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appropriate affinity is selected for incorporation 
into a reagent of the type described above. If 
necessary, point substitutions seeking to optimize 
binding may be made in the DNA using conventional 
casette mutagenesis or other protein engineering 
methodology such as is disclosed below. 

Preparation of the proteins of the invention 
also is dependent on knowledge of the amino acid 
sequence (or corresponding DNA or UNA. sequence) of 
bioactive proteins such as enzymes, toxins, growth 
factors, cell differentiation factors, receptors, 
aati-metabolites, hormones or various cytokines or 
lymphokines. Such sequences are reported in the 
literature and available through computerized data 
banks . 

The DHA sequences of the binding site and 
the second protein domain are fused using 
conventional techniques, or assembled from 
synthesized oligonucleotides, and then expressed 
using equally conventional techniques. 

The processes for manipulating, amplifying, 
and recombining DNA. which encode amino acid sequences 
of interest are generally well "known in the art, and 
therefore, not described in -detail herein. Methods 
of identifying and isolating genes encoding 
antibodies of interest are well understood, and 
described in the patent and other literature. In 
general, the methods involve selecting genetic 
material coding for amino acids which define the 
proteins of interest, including the CDRs and FRs of 
interest, according to the genetic code. 
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Accordingly, the construction of DNAs 
encoding proteins as disclosed herein can be done 
using known techniques involving the use of various 
restriction enzymes which make sequence specific cuts 
in DNA to produce blunt ends or cohesive ends, DNA 
ligases, techniques enabling enzymatic addition of 
sticky ends to blunt-ended DNA, construction of 
synthetic DNAs by assembly of short or medium length 
oligonucleotides, cDNA synthesis techniques, and 
synthetic probes for isolating immunoglobulin or 
other bioactive protein genes. Various promoter 
sequences and other regulatory DNA sequences used, in 
achieving expression, and various types of host cells 
are also known and available. Conventional 
transfection techniques, and equally conventional 
techniques for cloning and subcloning DNA are useful 
in the practice of this invention and known to those 
skilled in the art. Various types of vectors may be 
used such as plasmids and viruses including animal 
viruses and bacteriophages. The vectors may exploit 
various marker genes which impart to a successfully 
transfected cell a detectable phenotypic property 
that can be used to identify which of a family of 
clones has successfully incorporated the recombinant 
DNA of the vector. 

One method for obtaining DNA encoding the 
proteins disclosed herein is by assembly of synthetic 
oligonucleotides produced in a conventional, 
automated, polynucleotide synthesizer followed by 
ligation with appropriate ligases. For example, 
overlapping, complementary DNA fragments comprising 
15 bases may be synthesized semi manually using 
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phosphoramidite chemistry-, with end segments left 

unphosphorylated to prevent polymerization during 

ligation. One end of the synthetic DNA is left with 

a "sticky end* corresponding to the site of action of 

a particular restriction endonuclease, and the other * 

end is left with an end corresponding to the site of 

action of another restriction endonuclease. * 

Alternatively, this approach can be fully automated. 

The PNA encoding the protein may be created by 

synthesizing longer single strand fragments (e.g., 

50-100 nucleotides long) in, for example, a Biosearch 

oligonucleotide synthesizer, and then ligating the 

fragments » 

A method of producing the BABS of the 
invention is to produce a synthetic DNA encoding a 
polypeptide comprising, e.g., human FRs, and 
intervening "dummy" CDRs, or amino acids having no 
function except to define suitably situated unique 
restriction sites. This synthetic DNA is then 
altered by DNA replacement, in which restriction and 
ligation is employed to insert synthetic 
oligonucleotides encoding CDRs defining a desired 
binding specificity in the proper location between 
the FRs. This approach facilitates empirical 
refinement of the binding properties of the BABS. 

This technique is dependent upon the ability 
to cleave a DNA corresponding in structure to a 
variable domain gene at specific sites flanking * 
nucleotide sequences encoding CDRs. These 

restriction sites in some cases may be found in the * 
native gene. Alternatively, non-native restriction 
sites may be engineered into the nucleotide sequence 
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resulting in a synthetic gen with a different 
sequence of nucleotides than the native gene, but 
encoding the same variable region amino acids because 
of the degeneracy of the genetic code. The fragments 
resulting from endonuclease digestion, and comprising 
FR-encoding sequences , are then ligated to non-native 
CDR-encoding sequences to produce a synthetic 
variable domain gene with altered antigen -binding 
specificity* Additional nucleotide sequences 
encoding, for example, constant region amino acids or 
a bioactive molecule may then be linked to the gene 
sequences to produce a bifunctional protein. 

The expression of these synthetic DNA's can 
be achieved in both prokaryotic and eucaryotic 
systems via transfection with an appropriate vector. 
In E± coli and other microbial hosts, the synthetic 
genes can be expressed as fusion protein which is 
subsequently cleaved. Expression in eucaryotes can 
be accomplished by the transfection of DNA sequences 
encoding CDR and FR region amino acids and the amino 
acids defining a second function into a myeloma or 
other type of cell line. By this strategy intact 
hybrid antibody molecules having hybrid Fv regions 
and various bioactive proteins including a 
biosynthetic binding site may be produced* For 
fusion protein expressed in bacteria, subsequent 
proteolytic cleavage of the isolated fusions can be 
performed to yield free BABS, which can be renatured 
to obtain an intact biosynthetic, hybrid antibody 
binding site. 
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Heretofore, it has not been possible to 
cleave the heavy and light chain region to separate 
the variable and constant regions of an 
immunoglobulin so as to produce intact Fv, except in 
specific cases not of commercial utility. However, 
one method of producing BABS in accordance with this 
invention is to redesign DNAs encoding the heavy and 
light chains of an immunoglobulin, optionally 
altering its specificity or humanizing its FRs, and 
incorporating a cleavage site and "hinge region" 
between the variable and constant regions of both the 
heavy and light chains. Such chimeric antibodies can 
be produced in transfectomas or the like and 
subsequently cleaved using a preselected 
endopeptidase . 

The hinge region is a sequence of amino 
acids which serve to promote efficient cleavage by a 
preselected cleavage agent at a preselected, built-in 
cleavage site. It is designed to promote cleavage 
preferentially at the cleavage site when the 
polypeptide is treated with the cleavage agent in an 
appropriate environment. -■ - 

The hinge region can take many different 
forms. Its design involves selection of amino acid 
residues (and a DNA fragment encoding them) which 
impart to the region of the fused protein about the 
cleavage site an appropriate polarity, charge 
distribution, and stereochemistry which, in the 
aqueous environment where the cleavage takes place, 
efficiently exposes the cleavage site to the cleavage 
agent in preference to other potential cleavage sites 
that may be present in the polypeptide, and/or to - 
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improve the kinetics of the cleavage f action. In 
specific cases, the amino acids of the hinge are 
selected and assembled in sequence based on their 
known properties, and then the fused polypeptide 
sequence is expressed, tested, and altered for 
refinement. 

The hinge region is free of cysteine. This 
enables the cleavage reaction to be conducted under 
conditions in which the protein assumes its tertiary 
conformation, and may be held in this conformation by 
intramolecular disulfide bonds. It has been 
discovered that in these conditions access of the 
protease to potential cleavage sites which may be 
present within the target protein is hindered. The 
hinge region may comprise an amino acid sequence 
which includes one or more proline residues. This 
allows formation of a substantially unfolded 
molecular segment. Aspartic acid, glutamic acid, 
arginine, lysine, serine, and threonine residues 
maximize ionic interactions and may be present in 
amounts and/or in sequence which renders the moiety 
comprising the hinge water soluble. 

The cleavage site preferably is immediately 
adjacent the Fv polypeptide chains and comprises one 
amino acid or a sequence of amino acids exclusive of 
any sequence found in the amino acid structure of the 
chains in the Fv. The cleavage site preferably is 
designed for unique or preferential cleavage by a 
specific selected agent. Endopeptidases are 
preferred, although non-enzymatic (chemical) cleavage 
agents may be used. Many useful cleavage agents, for 
instance, cyanogen bromide, dilute acid, trypsin, 
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Staphylococcus aureus V-8 protease, post proline 

cleaving enzyme, blood coagulation Factor Xa, 

enterokinase, and renin, recognize and preferentially 

or exclusively cleave particular cleavage sites • One 

currently preferred cleavage agent is V-8 protease. * 

The currently preferred cleavage site is a Glu 

residue. Other useful enzymes recognize multiple * 

residues as a cleavage site, e.g., factor Xa 

(Ile-Glu-Gly-Arg) or enterokinase 

(Asp-Asp-Asp-Asp-Lys) o The principles of this 

selective cleavage approach may also be used in the 

design of the linker and spacer sequences of the 

multifunctional constructs of the invention where an 

exciseable linker or selectively cleavable linker or 

spacer is desired. 

Design of Synthetic V fl and V L Mimics 

FRs from the heavy and light chain murine 
anti-digoxin monoclonal 26-10 (Figures 4A and 4B) 
were encoded on the same DNAs .with CDRs from the 
murine anti-lysozyme monoclonal glp-4 heavy chain 
(Figure 3 sequence 1) and light chain to produce V R 
(Figure 4C) and V L (Figure 4D) regions together 
defining a biosynthetic antibody binding site which 
is specific for lysozyme. Murine CDRs from both the 
heavy and light chains of monoclonal glp-4 were 
encoded on the same DMAs with FRs from the heavy and * 
light chains of human myeloma antibody NEWM (Figures 
4E and 4F) . The resulting interspecies chimeric * 
antibody binding domain has reduced iramunogenicity in 
humans because of its human FRs, and specificity for 
lysozyme because of its murine CDRs. 
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A synthetic DNA was designed to facilitate 
CDR insertions into a human heavy chain FR and to 
facilitate empirical refinement of the resulting 
chimeric amino acid sequence. This DNA is depicted 
in Figure 5. 

A synthetic, bifunctional FB-binding site 
protein was also designed at the DNA level , 
expressed, purified, renatured, and shown to bind 
specifically with a preselected antigen (digoxin) and 
Fc. The detailed primary structure of this construct 
is shown in Figure 6; its tertiary structure is 
illustrated schematically in Figure 2B. 

Details of these and other experiments, and 
additional design principles on which the invention 
is based, are set forth below. 

GENE DESIGN AND EXPRESSION 

Given known variable region DNA sequences, 
synthetic V L and V H genes may be designed which 
encode native or near native FR and CDR amino acid 
sequences from an antibody molecule, each separated 
by unique restriction sites located as close to 
FR— CDR and CDR-FR borders as possible. 
Alternatively, genes may be designed which encode 
native FR sequences which are similar or identical to 
the FRs of an antibody molecule from a selected 
species, each separated by "dummy" CDR sequences 
containing strategically located restriction sites. 
These DNAs serve as starting materials for producing 
BABS, as the native of "dummy* CDR sequences may be 
excised and replaced with sequences encoding the CDR 
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amino acids defining a select d binding site. 
Alternatively, one may design and directly synthesize 
native or near-native FR sequences from a first 
antibody molecule, and CDR sequences from a second 
antibody molecule. Any one of the V- and V_ 

XI Ii 

sequences described above may be linked together 
directly, via an amino acids chain or linker 
connecting the C-terrainus of one chain with the 
N-terminus of the other. 

These genes, once synthesized, may be cloned 
with or without additional DNA sequences coding for, 
e.g., an antibody constant region, enzyme, or toxin, 
or a leader peptide which facilitates secretion or 
intracellular stability of a fusion polypeptide- The 
genes then can be expressed directly in an 
appropriate host cell, or can be further engineered 
before expression by the exchange of FR, CDR, or 
"dummy- CDR sequences with new sequences. This 
manipulation is facilitated by the presence of the 
restriction sites which have been engineered into the 
gene at the FR-CDR and CDR-FR borders . 

Figure 3 illustrates the general approach to 
designing a chimeric V g ; further details of 
exemplary designs at the DNA level are shown in 
Figures 4A-4F- Figure 3, lines 1 and 2, show the 
amino acid sequences of the heavy chain variable 
region of the murine monoclonals glp-4 
(anti-lysozyme) and 26-10 (anti-digoxin) , including 
the four FR and three CDR sequences of each. Line 3 
shows the sequence of a chimeric V g which comprises 
26-10 FRs and glp-4 CDRs. As illustratedr -the hybrid 
protein of line 3 is identical to thg- native protein 
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of line 2, except that 1) th sequence TFTNYYIHWLK 
has replaced the sequence IFTDFYMNWVR, 2) 
EWIGWIYPGNGNTKYNENFKG has replaced 
DYIGYISPYSGVTGYNQKFKG, 3) RYTHYYF has replaced 
GSSGNKWAM, and 4) A has replaced V as the sixth amino 
acid beyond CDR-2. These changes have the effect of 
changing the specificity of the 26-10 V H to mimic 
the specificity of glp-4. The Ala to Val single 
amino acid replacement within the relatively 
conserved framework region of 26-10 is an example of 
the replacement of an amino acid outside the 
hypervariable region made for the purpose of altering 
specificity by CDR replacement. Beneath sequence 3 
of Figure 3, the restriction sites in the DNA 
encoding the chimeric V R (see Figures 4A-4F) are 
shown which are disposed about the CDR-FR borders. 

Lines 4 and 5 of Figure 3 represent another 
construct. Line 4 is the full length V R of the 
human antibody NEWM. That human antibody may be made 
specific for lysozyme by CDR replacement as shown in 
line 5. Thus, for example, the segment TFTNYYIHWLK 
from glp-4 replaces TFSNDYYTWVR of NEWM, and its 
other CDRs are replaced as shown. This results in a 
V H comprising a human framework with murine 
sequences determining specificity. 

By sequencing any antibody, or obtaining the 
sequence from the literature, in view of this 
disclosure one skilled in the art can produce a BABS 
of any desired specificity comprising any desired 
framework region. Diagrams such as Figure 3 
comparing the amino acid sequence are valuable in 
suggesting which particular amino acids should be 
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replaced to det rmine the desired complementarity. 

Expressed sequences may be tested for binding and 

refined by exchanging selected amino acids in 

relatively conserved regions, based on observation of 

trends in amino acid sequence data and/or computer 

modeling techniques* 

Significant flexibility in V„ and V- 

ti Jj 

design is possible because the amino acid sequences 
are determined at the DNA level, and the manipulation 
of DNA can be accomplished easily. 

For example, the DNA sequence for murine V„ 

H 

and V L 26-10 containing specific restriction sites 
flanking each of the three CDRs was designed with the 
aid of a commercially available computer program 
which performs combined reverse translation and 
restriction site searches ("RV.exe" by Compugene, 
Inc.)* The known amino acid sequences for V- and 

XI 

V L 26-10 polypeptides were entered, and all 
potential DNA sequences which encode those peptides 
and all potential -restriction sites were analyzed by 
the program., The program can, in addition, select 
DNA sequences encoding the peptide using only codons 
preferred by E, coll if this bacterium is to be host 
expression organism of choice. Figures 4A and 4B 
show an example of program output. The nucelic acid 
sequences of the synthetic gene and the corresponding 
amino acids are shown. Sites of restriction 
endonuclease cleavage are also indicated. The CDRs 
of these synthetic genes are underlined. 
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The DNA s quences for the synthetic 26-10 
V H and V L are designed so that one or both of the 
restriction sites flanking each of the three CDRs are 
unique. A six base site (such as that recognized by 
Bsra I or BspM I) is preferred, but where six base 
sites are not possible, four or five base sites are 
used. These sites, if not already unique, are 
rendered unique within the gene by eliminating other 
occurrences within the gene without altering 
necessary amino acid sequences. Preferred cleavage 
sites are those that, once cleaved, yield fragments 
with sticky ends just outside of the boundary of the 
CDR within the framework. However, such ideal sites 
are only occasionally possible because the FR-CDR 
boundary is not an absolute one, and because the 
amino acid sequence of the FR may not permit a 
restriction site. In these cases, flanking sites in 
the FR which are more distant from the predicted 
boundary are selected. 

Figure 5 discloses the nucleotide and 
corresponding amino acid sequence (shown in standard 
single letter code) of a synthetic DNA comprising a 
master framework gene having the generic structure: 

R 1 -PR 1 -X r ER 2 -X 2 -PR 3 -X 3 -FR 4 -R 2 

where R^ and R 2 are restricted ends which are to 
be ligated into a vector, and X^ X 2 , and X 3 
are DNA sequences whose function is to provide 
convenient restriction sites for CDR insertion. This 
particular DNA has murine FR sequences and unique, 
6-base restriction sites adjacent the FR borders so 
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that nucleotide sequences encoding CDRs from a 
desired monoclonal can be inserted easily. 
Restriction endonuclease digestion sites are 
indicated with their abbreviations; enzymes of choice 
for CDR replacement are underscored. Digestion of 
the gene with the following restriction endonucleases 
results in 3* and 5' ends which can easily be matched 
up with and ligated to native or synthetic CDRs of 
desired specificity; KphI and BstXI are used for 
ligation of CDR^ Xbal and Oral for CDR 2 ; and 
BssHII and Clal for CDRg. 

OLIGONUCLEOTIDE SYNTHRRTg 

The synthetic genes and DNA fragments 
designed as described above preferably are produced 
by assembly of chemically synthesized 
oligonucleotides* 15-100mer oligonucleotides may be 
synthesized on a Biosearch DNA Model 8600 
Synthesizer, and purified by polyacryl amide gel 
electrophoresis (PAGE) in Tris-Borate-EDTA buffer 
(TBE). The DNA is then electroeluted from the gel. 
Overlapping oligomers may be phospho^lated by T4 
polynucleotide kinase and ligated into larger blocks 
which may also be purified by^ PAGE « 

CLONING OF SYNTHETIC OLIGONUCLEOTIDES 

The blocks or the pairs of longer 
oligonucleotides may be cloned into EL. coli using a 
suitable, e.g., pUC, cloning vector. Initially, this 
vector may be altered by single strand mutagenesis to 
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eliminate residual six base altered sites. For 
example, V H may be synthesized and cloned into pUC 
as five primary blocks spanning the following 
restriction sites: 1. EcoRI to first Narl site; 2. 
first Narl to Xbal; 3. Xbal to Sail; 4. Sail to Ncol; 
5. Ncol to BamHI. These cloned fragments may then be 
isolated and assembled in several three-fragment 
ligations and cloning steps into the pUC8 plasmid. 
Desired ligations selected by PAGE are then 
transformed into, for example, JL. coli strain JM83, 
and plated onto LB Ampicillin + Xgal plates according 
to standard procedures. The gene sequence may be 
confirmed by supercoil sequencing after cloning, or 
after subcloning into M13 via the dideoxy method of 
Sanger. 

PRINCIPLE OF CDR EXCHANGE 

Three CDRs (or alternatively, four FRs) can 
be replaced per or V L# In simple cases, this 
can be accomplished by cutting the shuttle pUC 
plasmid containing the respective genes at the two 
unique restriction sites flanking each CDR or FR, 
removing the excised sequence, and ligating the 
vector with a native nucleic acid sequence or a 
synthetic oligonucleotide encoding the desired CDR or 
FR. This three part procedure would have to be 
repeated three times for total CDR replacement and 
four times for total FR replacement. Alternatively, 
a synthetic nucleotide encoding two consecutive CDRs 
separated by the appropriate FR can be ligated to a 
pUC or other plasmid containing a gene whose 
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corresponding CDRs and FR have been cl aved out* 
This procedure reduces the number of steps required 
to perform CDR and/or FR exchange. 

EXPRESSION OF PBOTBTW ^ 

The engineered genes can be expressed in 
appropriate prokaryotic hosts such as various strains 
of Ej. gq!%, and in eucaryotic hosts such as Chinese 
hamster ovary cell, murine myeloma, and human 
myeloma/trans fectoma cells. 

For example, if the gene is to be expressed 
in 1^ sail, it may first be cloned into an expression 
vector. This is accomplished by positioning the 
engineered gene downstream from a promoter sequence 
such as trp or tac, and a gene coding for a leader 
peptide. The resulting expressed fusion protein 
accumulates in ref ractile bodies in the cytoplasm of 
the cells, and may be harvested after disruption of 
the cells by French press or sonication. The 
retractile bodies are solubiLized, and the expressed 
proteins refolded and cleaved by the methods already 
established for many other recombinant proteins. 

If the engineered gene is to be expressed in 
myeloma cells, the conventional expression system for 
immunoglobulins, it is first inserted into an 
expression vector containing, for example, the Ig 
promoter, a secretion signal, immunoglobulin 
enhancers, and various introns. This plasmid may 
also contain sequences encoding all or part of a 
constant region, enabling an entire part of a heavy 
or light chain to be expressed. The gene is 
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transfected into myeloma c lis via established 
electroporation or protoplast fusion methods. Cells 
so transfected can express V L or V H fragments , 

V L2 or V H2 homodimers # v l~ V H heterodimer S/ 
V H -V L or V L »V H single chain polypeptides, 
complete heavy or light immunoglobulin chains, or 
portions thereof, each of which may be attached in 
the various ways discussed above to a protein region 
having another function (e.g., cytotoxicity). 

Vectors containing a heavy chain V region 
(or V and C regions) can be cotransfected with 
analogous vectors carrying a light chain V region (or 
V and C regions), allowing for the expression of 
noncovalently associated binding sites (or complete 
antibody molecules). 

In the examples which follow, a specific 
example of how to make a single chain binding site is 
disclosed, together with methods employed to assess 
its binding properties. Thereafter, a protein 
construct having two functional domains is 
disclosed. Lastly, there is disclosed a series of 
additional targeted proteins which exemplify the 
invention. 

I EXAMPLE OF CDR EXCHANGE AND EXPRESSION 

The synthetic gene coding for murine V„ 
and V L 26-10 shown in Figures 4A and 4B were 
designed from the known amino acid sequence of the 
protein with the aid of Compugene, a software 
program. These genes, although coding for the native 
amino acid sequences, also contain non-native and - 
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often unique restriction sites flanking nucleic acid 
sequences encoding CDR's to facilitate CDR 
replacement as noted above* 

Both the 3' and 5' ends of the large 
synthetic oligomers were designed to include 6-base 
"restriction sites, present in the genes and the pUC 
vector. Furthermore, those restriction sites in the 
synthetic genes which were only suited for assembly 
but not for cloning the pUC were extended by "helper" 
cloning sites with matching sites in pUC. 

Cloning of the synthetic DNA and later 
assembly of the gene is facilitated by the spacing of 
unique restriction sites along the gene. This allows 
corrections and modifications by cassette mutagenesis 
at any location. Among them are alterations near the 
5* or 3' ends of the gene as needed for the 
adaptation to different expression vectors. For 
example, a PstI site is positioned near the 5-* end of 
the V H gene. Synthetic linkers can be attached 
easily between this site and a restriction site in 
the expression plasmid. These genes were synthesized 
by assembling oligonucleotides as described above 
using a Biosearch Model 8600 DNA Synthesizer. They 
were ligated to vector pUC8 for transformation of E. 

Specific CDRs may be cleaved from the 
synthetic gene by digestion with the following 
pairs of restriction endonucleases: HpHI and BstXI 
for CDR 1 ; Xbal and Dral for CDR 2 ; and Banll and 
BanI for CDRg. After removal on one CDR, another 
CDR of desired specificity may be ligated directly 
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into the restricted gene, in its place if the 3 1 and 
5' ends of the restricted gene and the new CDR 
contain complementary single stranded DNA sequences. 

In the present example, the three CDRs of 
each of murine V„ 26-10 and V T 26-10 were 

n Ii ^ 

replaced with the corresponding CDRs of glp-4. The 
nucleic acid sequences and corresponding amino acid 
sequences of the chimeric V H and V L genes 
encoding the FRs of 26-10 and CDRs of glp-4 are shown 
in Figures 4C and 4D. The positions of the 
restriction endonuclease cleavage sites are noted 
with their standard abbreviations. CDR sequences are 
underlined as are the restriction endonucleases of 
choice useful for further CDR replacement. 

These genes were cloned into pUC8, a shuttle 
plasmid. To retain unique restriction sites after 
cloning, the V H -like gene was spliced into the 
EcoRl and Hindi I I or BamHI sites of the plasmid. 

Direct expression of the genes may be 
achieved in L. coli . Alternatively, the gene may be 
preceded by a leader sequence and expressed in E. 
CQli as a fusion product by splicing the fusion gene 
into the host gene whose expression is regulated by 
interaction of a repressor with the respective 
operator. The protein can be induced by starvation 
in minimal medium and by chemical inducers. The 
V H -V L biosynthetic 26-10 gene has been expressed 
as such a fusion protein behind the trp and tac 
promoters. The gene translation product of interest 
may then be cleaved from the leader in the fusion 
protein by e.g., cyanogen bromide degradation, 
tryptic digestion, mild acid cleavage, and/or 
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digestion with factor Xa protease. Therefore, a 

shuttle plasmid containing a synthetic gene encoding 

a leader peptide having a site for mild acid 

cleavage , and into which has been spliced the 

synthetic BABS gene was used for this purpose. In • 

addition, synthetic DNA sequences encoding a signal 

peptide for secretion of the processed target protein 3 

into the periplasm of the host cell can also be 

incorporated into the plasmid. 

After harvesting the gene product and 
optionally releasing it from a fusion peptide, its 
activity as an antibody binding site and its 
specificity for glp-4 (lysozyme) epitope are assayed 
by established immunological techniques, e.g., 
affinity chromatography and radioimmunoassay. 
Correct folding of the protein to yield the proper 
three-dimensional conformation of the antibody 
binding site is prerequisite for its activity. This 
occurs spontaneously in a host such as a myeloma cell 
which naturally expresses immunoglobulin proteins. 
Alternatively, for bacterial expression, the 'protein 
forms inclusion bodies which, after harvesting, must 
be subjected to a specific sequence of solvent 
conditions (e.g., diluted 20 X from 8 M urea 0.1 M 
Tris-HCl pH 9 into 0.15 M NaCl, 0.01 M sodium 
phosphate, pH 7.4 (Hochman et al. (1976) Biochem. 
15.: 2706-2710) to assume its correct conformation and 
hence its active form. * 

Figures 4E and 4F show the DNA and amino 
acid sequence of chimeric V fl and V L comprising - 
human FRs from NEWM and murine CDRs from glp-4. The 
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CDRs are underlined, as are restriction sites of 
choice for further CDR replacement or empirically 
determined refinement. 

These constructs also constitute master 
framework genes, this time constructed of human 
framework sequences. They may be used to construct 
BABS of any desired specificity by appropriate CDR 
replacement. 

Binding sites with other specificities have 
also been designed using the methodologies disclosed 
herein. Examples include those having FRs from the 
human NEWM antibody and CDRs from murine 26-10 
(Figure 9A) , murine 26-10 FRs and G-loop CDRs (Figure 
9B), FRs and CDRs from murine MOPC-315 (Figure 9C) , 
FRs and CDRs from an anti-human carcinoembryonic 
antigen monoclonal antibody (Figure 9D) , and FRs and 
CDRs 1, 2, and 3 from V L and FRs and CDR 1 and 3 
from the V H of the anti-CEA antibody, with CDR 2 
from a consensus immunoglobulin gene (Figure 9E) . 

II. Model Binding Site; 

The digoxin binding site of the IgG 2a k 

monoclonal antibody 26-10 has been analyzed by 

Mudgett-Hunter and colleagues (unpublished) . The 

26-10 V region sequences were determined from both 

amino acid sequencing and DNA sequencing of 26-10 H 

and L chain mRNA transcripts (D. Panka, J.N. & 

M.N.M. , unpublished data). The 26-10 antibody 

exhibits a high digoxin binding affinity [K =5.4 

9 -1 ° 
X 10^ M x ] and has a well-defined specificity 

profile, providing a baseline for comparison with the 

biosynthetic binding sites mimicking its structure. 



Protein Design : 



Crystallographically determined atomic 

coordinates for Fab fragments of 26-10 were obtained 

from the Brookhaven Data Bank. Inspection of the 

available three-dimensional structures of Fv regions 

within their parent Fab fragments indicated that the 

Euclidean distance between the C-terminus of the V 
_ H 
domain and the N-terminus of the V T domain is about 

35 A. Considering that the peptide unit length is 

approximately 3.8 A, a 15 residue linker was selected 

.to bridge this gap. The linker was designed so as to 

exhibit little propensity for secondary structure and 

not to interfere with domain folding. Thus, the 15 

residue sequence (Gly-Gly-Gly-Gly-Ser) 3 was 

selected to connect the V R carboxyl- and V L 

amino-termini . 

Binding studies with single chain binding 

sites having less -than or greater than 15 residues 

demonstrate the importance of the prerequisite 

distance which^ must separate V H from V L ; for 

example, a (Gly^Ser^ linker does not 

demonstrate binding activity, and those with 

(Gly 4 -Ser) 5 linkers exhibit very low activity 

compared to those with (Gly 4 -Ser) 3 linkers. 

Gene Synthesis? 

Design of the 744 base sequence for the 
synthetic binding site gene was derived from the Fv 
protein sequence of 26-10 by choosing codons 
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frequently used in E. coli . The model of this 

representative synthetic gene is shown in Figure 8/ 
discussed previously. Synthetic genes coding for the 
trp promoter-operator, the modified trp LE leader 
peptide (MLE) , the sequence of which is shown in 
Figure 10A, and V H were prepared largely as 
described previously. The gene coding for V R was 
assembled from 46 chemically synthesized 
oligonucleotides, all 15 bases long, except for 
terminal fragments (13 to 19 bases) that included 
cohesive cloning ends. Between 8 and 15 overlapping 
oligonucleotides were enzymatically ligated into 
double stranded DNA, cut at restriction sites 
suitable for cloning (Narl, Xbal, Sail, SacII, Sad), 
purified by PAGE on 8% gels, and cloned in pUC which 
was modified to contain additional cloning sites in 
the polylinker. The cloned segments were assembled 
stepwise into the complete gene mimicking Vg by 
ligations in the pUC cloning vector. 

The gene .mimicking 26-10 V L was assembled 
from 12 long synthetic polynucleotides ranging in 
size from 33 to 88 base pairs, prepared in automated 
DNA synthesizers (Model 6500, Biosearch, San Rafael, 
CA; Model 380A, Applied Biosystems, Foster City, 
CA) . Five individual double stranded segments were 
made out of pairs of long synthetic oligonucleotides 
spanning six-base restriction sites in the gene 
(Aatll, BstEII, Ppnl, Hindlll, Bglll, and PstI) . In 
one case, four long overlapping strands were combined 
and cloned. Gene fragments bounded by restriction 
sites for assembly that were absent from the pUC 
polylinker, such as Aatll and BstEII, were flanked by 
EcoRI and BamHI ends to facilitate cloning. 
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The linker between V R and V L , encoding 
(GIy-GIy-GIy-Gly-Ser) 3> was cloned from two long 
synthetic oligonucleotides, 54 and 62 bases long, 
spanning Sad and Aatll sites, the latter followed by 
an EcoRI cloning end. The complete single chain 
binding site gene was assembled from the V„,- 

XI It 

and linker genes to produce a construct, 
corresponding to aspartyl-prolyl-V H -clinker>-V L# 
flanked by EcoRI and PstI restriction sites. 

The taa promoter-operator, starting from its 
Sspl site, was assembled from 12 overlapping 15 base 
oligomers, and the MLE leader gene was assembled from 
24 overlapping 15 base oligomers. These were cloned 
and assembled in pUC using the strategy of assembly 
sites flanked by cloning sites. The final expression 
plasmid was constructed in the pBR322 vector by a 
3-part ligation using the sites Sspl, EcoRI, and PstI 
(see Figure 10B) . Intermediate DNA fragments and 
assembled genes were sequenced by the dideoxy method. 

Fusion Protein Expression 

Single-chain protein was expressed as a 
fusion protein. The MLE leader gene (Fig. 10A) was 
derived from E. coli trp LE sequence and expressed 
under the control of a synthetic trp promoter and 
operator. E. coli strain JM83 was transformed with 
the expression plasmid and protein expression was 
induced in M9 minimal medium by addition of 
indoleacrylic acid (10 jig/ml) at a cell density 
with A g00 » 1. The high expression levels of the 
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fusion protein resulted in its accumulation as 
insoluble protein granules, which were harvested from 
cell paste (Figure 11, Lane 1). 

Fusion Protein Cleavage : 

The MLE leader was removed from the binding 
site protein by acid cleavage of the Asp-Pro peptide 
bond engineered at the junction of the MLE and 
binding site sequences. The washed protein granules 
containing the fusion protein were cleaved in 6 M 
guanidine-HCl + 10% acetic acid, pH 2.5, incubated at 
37°C for 96 hrs. The reaction was stopped through 
precipitation by addition of a 10-fold excess of 
ethanol with overnight incubation at -20°C, followed 
by centrifugation and storage at -20°C until further 
purification (Figure 11, Lane 2). 

Protein Purification ; 

The acid cleaved binding site was separated 
from remaining intact fused protein species by 
chromatography on DEAE cellulose. The precipitate 
obtained from the cleavage mixture was redissolved in 
6 M guanidine-HCl + 0,2 M Tris-HCl, pH 8.2, + 0.1 M 
2-mercaptoethanol and dialyzed exhaustively against 6 
M urea + 2.5 mM Tris-HCl, pH 7.5, + 1 mM EDTA. 
2-Mercaptoethanol was added to a final concentration 
of 0.1 M, the solution was incubated for 2 hrs at 
room temperature and loaded onto a 2.5.X 45 cm column 
of DEAE cellulose (Whatman DE 52), equilibrated with 
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6 M urea + 2.5 nM Tris-HCl + 1 mM EDTA, pH 7.5 • The 
intact fusion protein bound weakly to the DE 52 
column such that its elution was retarded relative to 
that of the binding protein. The first protein 
fractions which eluted from the column after loading 
and washing with urea buffer contained BABS protein 
devoid of intact fusion protein. Later fractions 
contaminated with some fused protein were pooled, 
rechromatographed on DE 52 , and recovered single 
chain binding protein combined with other purified 
protein into a single pool (Figure 11, Lane 3). 

Refolding: 

The 26-10 binding site mimic was refolded as 
follows: the DE 52 pool, disposed in 6 M urea + 2.5 
mM Tris-HCl + 1 mM EDTA, was adjusted to pH 8 and 
reduced with 0.1 M 2-mercaptoethanol at 37 a C for 90 
min. This was diluted at least 100-fold with 0.01 M 
sodium acetate, pH 5.5, to a concentration below 10 
lig/ml and dialyzed at 4°C for 2 days against 
acetate buffer. 

Affinity Chromatography? 

Purification of active binding protein by 
affinity chromatography at 4 °C on a 
ouabain-amine-Sepharose column was performed. The 
dilute solution of refolded protein was loaded 
directly onto a pair of tandem columns, each 
containing 3 ml of resin equilibrated with the 0.01 M 
acetate buffer, pH 5.5. The columns were washed - 



WO 88/09344 PCT/US88/01737 



- 57 - 



individually with an excess of the acetate buffer, 
and then by sequential additions of 5 ml each of 1 M 
NaCI, 20 mM ouabain, and 3 M potassium thiocyanate 
dissolved in the acetate buffer, interspersed with 
acetate buffer washes. Since digoxin binding 
activity was still present in the eluate, the eluate 
was pooled and concentrated 20-fold by 
ultrafiltration (PM 10 membrane, 200 ml concentrator; 
Amicon), reapplied to the affinity columns, and 
eluted as described. Fractions with significant 
absorbance at 280 nm were pooled and dialyzed against 
PBSA or the above acetate buffer. The amounts of 
protein in the DE 52 and ouabain-Sepharose pools were 
quantitated by amino acid analysis following dialysis 
against 0.01 M acetate buffer. The results are shown 
below in Table 1. 
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TABLE 1 

Estimated Yields of BABS P rotein During Purification 



Step 

Cell 
paste 

Fusion 

protein 

Granules 

Acid 

Cleavage/ 
DE 52 
pool 

Ouabain- 

Sepharose 

pool 



Wet wt- 
Per 1 

12,0 g 
2.3 g 



mg 

protein 
1440.0 mg a 



Cleavage 
yield (%) 
prior step 



144.0 mg 



18 . 1 mg 



12. 6 d 



Yield 
relative 
to fusion 
prQtein 



480.0 mg8#b 100.0% 100.0% 



38. 0 e 38. 0 e 



4.7 e 



a Determined by Lowry protein analysis 

^Determined by absorbance measurements 

c Determined by amino acid analysis 

Calculated from the amount of BABS protein 
specifically eluted from ouabain-Sepharose relative 
to that applied to the resin; values were determined 
by amino acid analysis 

Percentage yield calculated on a molar basis 
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Sequence Analysis of Gene and Protein : 

The complete gene was sequenced in both 
directions using the dideoxy method of Sanger which 
confirmed the gene was correctly assembled. The 
protein sequence was also verified by protein 
sequencing. Automated Edman degradation was conducted 
on intact protein (residues 1-40), as well as on two 
major CNBr fragments (residues 108-129 and 140-159) 
with a Model 470A gas phase sequencer equipped with a 
Model 120A on-line phenylthiohydantoin-amino acid 
analyzer (Applied Biosystems, Foster City, CA) . 
Homogeneous binding protein fractionated by SDS-PAGE 
and eluted from gel strips with water, was treated 
with a 20 , 000-fold excess of CNBr, in 1% 
trif luoroacetic acid-acetonitrile (1:1), for 12 hrs at 
25° (in the dark). The resulting fragments were 
separated by SDS-PAGE and transferred 
electrophoretically onto an Immobilon membrane 
(Millipore, Bedford, MA), from which stained bands 
were cut out and sequenced. 

Specificity Dete rmination : 

Specificities of anti-digoxin 26-10 Fab and 
the BABS were assessed by radioimmunoassay. Wells of 
microtiter plates were coated with affinity-purified 
goat anti-murine Fab fragment (ICN ImmunoBiologicals, 
Lisle, IL) at 10 vtg/ml in PBSA overnight at 4°C. 
After the plates were washed and blocked with 1% horse 
serum in PBSA, solutions (50 pi) containing 26-10 
Fab or the BABS in either PBSA or 0.01 M sodium 
acetate at pH 5.5 were added to the wells and 
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incubated 2-3 hrs at room temperature. After unbound 

antibody fragment was washed from the wells, 25 ill 

of a series of concentrations of cardiac glycosides 
—4 —11 

(10 to 10 M in PBSA) were added. The cardiac 

glycosides tested included digoxin, digitoxin, 

digoxigenin, digitoxigenin, gitoxin, ouabain, and 

acetyl strophanthidin. After the addition of 
125 

I-digoxin (25 yl § 50,000 cpm; Cambridge 
Diagnostics, Billerica, MA) to each well, the plates 
were incubated overnight at 4°C, washed and counted • 
The inhibition curves are plotted in Figure 12. The 
relative affinities for each digbxin analogue were 
calculated by dividing the concentration of each 
analogue at 50% inhibition by the concentration of 
digoxin (or digoxigenin) that gave 50% inhibition. 
There is a displacement of inhibition curves for the 
BABS to lower glycoside concentrations than observed 
for 26-10 Fab, because less active BABS than 26-10 Fab 
was bound to the plate. When 0.25 M urea was added to 
the BABS in 0.01 M- sodium acetate, pH 5.5, more active 
sFv was bound to the goat anti-murine Fab coating on 
the plate. This caused the BABS inhibition curves to 
shift toward higher glycoside concentrations, closer 
to the position of those for 26-10 Fab, although 
maintaining the relative positions of curves for sFv 
obtained in acetate buffer alone* The results, 
expressed as normalized concentration of inhibitor 
giving 50% inhibition of 125 I-digoxin binding, are 
shown in Table 2. 
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TABLE 2 



26-10 

Antibody Normalizing 



Spegies 


Glvcoside D 


DS 


EQ 


DOG 


&=£ 


S 


Q 


Fab 


Digoxin 1.0 


1.2 


0.9 


1.0 


1.3 


9.6 


15 




Digoxigenin 0.9 


1.0 


0.8 


0.9 


1.1 


8>1 


13 


BABS 


Digoxin 1.0 


7.3 


2.0 


2.6 


5.9 


62 


150 




Digoxigenin 0.1 


1.0 


0.3 


0.4 


0.8 


8.5 


21 



D = Digoxin 

DG = Digoxigenin 

DO = Digitoxin 

DOG = Digitoxigenin 

A-S = Acetyl Strophanthidin 

G = Gitoxin 

O = Ouabain 

Affinity Determination: 

Association constants were measured by 
equilibrium binding studies. In immunoprecipitation 
experiments, 100 ]xl of H-digoxin (New England 
Nuclear, Billerica, MA) at a series of concentrations 
(10~ 7 M to 10" 11 M) were added to 100 }il of 
26-10 Fab or the BABS at a fixed concentration. 
After 2-3 hrs of incubation at room temperature, the 
protein was precipitated by the addition of 100 ixl 
goat antiserum to murine Fab fragment (ICN Immuno- 
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Biologicals), 50 yl of the IgG fraction of rabbit 

anti-goat IgG (ICN ImmunoBiologicals) , and 50 jil of 

a 10% suspension of protein A-Sepharose (Sigma). 

Following 2 hrs at 4°C, bound and free antigen were 

separated by vacuum filtration on glass fiber filters * 

(Vacuum Filtration Manifold, Millipore, Bedford, 

MA). Filter disks were then counted in 5 ml of * 

scintillation fluid with a Model 1500 Tri-Carb Liquid 

Scintillation Analyzer (Packard, Sterling, VA) . The 

association constants, K Q , were calculated from 

Scatchard analyses of the untransf ormed radioligand 

binding data using LIGAHD, a non-linear curve fitting 

program based on mass action. K Q s were also 

calculated by Sips plots and binding isotherms shown 

in Figure 13A for the BABS and 13B for the Fab. For 

binding isotherms, data are plotted as the 

concentration of digoxin bound versus the log of the 

unbound digoxin concentration, and the dissociation 

constant is estimated from the ligand concentration 

at 50% saturation. These binding data are also 

plotted in linear form as Sips plots (inset), having 

the same abscissa as the binding isotherm but with 

the ordinate representing log r/(n-r) , defined 

below. The average intrinsic association constant 

(K Q ) was calculated from the modified Sips equation 

(39), log (r/n-r) = a log C - a log K Qjf where r 

equals moles of digoxin bound per mole of antibody at 

an unbound digoxin concentration equal to C; n is the * 

number of moles of digoxin bound at saturation of the 

antibody binding site, and a is an index of * 

heterogeneity which describes the distribution of 

association constants about the average intrinsic- 
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association constant K Q . Least squares linear 
regression analysis of the data indicated correlation 
coefficients for the lines obtained were 0.96 for the 
BABS and 0.99 for 26-10 Fab. A summary of the 
calculated association constants are shown below in 
Table 3. 

TABLE 3 



Association Constant, K 
Method of Data K Q (BABS), M K Q (Fab), M A 
Analysis 

Scatchard plot (3.2 + 0.9) X 10 7 (1.9 + 0.2) X 10 8 

Sips plot 2.6 X 10 7 1.8 X 10 8 

Binding 

isotherm 5.2 X 10 7 3.3 X 10 8 



III. Synthesis of a Multifunctional Protein 

A nucleic acid sequence encoding the single 
chain binding site described above was fused with a 
sequence encoding the FB fragment of protein A as a 
leader to function as a second active region. As a 
spacer, the native amino acids comprising the last 11 
amino acids of the FB fragment bonded to an Asp-Pro 
dilute acid cleavage site was employed. The FB 
binding domain of the FB consists of the immediately 
preceding 43 amino acids which assume a helical 
configuration (see Fig. 2B) . 
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The gene fragments ar synthesized using a 
Biosearch DNA Model 8600 Synthesizer as described 
above. Synthetic oligonucleotides are cloned 
according to established protocol described above 
using the pUC8 vector transfected into EL. coli . The 
completed fused gene set forth in Figure 6A is then 
expressed in IL_ coli . 

After sonication, inclusion bodies were 
collected by centrifugation, and dissolved in 6 M 
guanidine hydrochloride (GuHCl), 0.2 M Tris, and 0.1 M 
2-mercaptoethanol (BME) , pH 8.2. The protein was 
denatured and reduced in the solvent overnight at room 
temperature. Size exclusion chromatography was used 
to purify fusion protein from the inclusion bodies. A 
Sepharose 4B column (1.5 X 80 cm) was run in a solvent 
of 6 M GuHCl and 0.01 M NaOAc # pH 4.75. The protein 
solution was applied to the column at room temperature 
in 0.5-1,0 ml amounts. Fractions were collected and 
precipitated with cold ethanol. These were run on SDS 
gels, and fractions rich in the recombinant protein 
(approximately 34,000 D) were pooled. This offers a 
simple first step for cleaning up inclusion body 
preparations without suffering significant proteolytic 
degradation. 

For refolding, the protein was dialyzed 
against 100 ml of the same GuHCl-Tris-BME solution, 
and dialysate was diluted 11-fold over two. days to 
0.55 M GuHCl, 0.01 M Tris,. and 0.01 M BME. The 
dialysis sacks were then transferred to 0.01 M NaCI, 
and the protein was dialyzed exhaustively before being 
assayed by RIA's for binding of 125 I-labelled 
digoxin. The refolding procedure can be simplified by 
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making a rapid dilution with water to reduce the GuHCl 
concentration to 1.1 M, and then dialyzing against 
phosphate buffered saline (0.15 M NaCI, 0.05 M 
potassium phosphate, pH 7, containing 0.03% NaN 3 ), 
so that it is free of any GuHCl within 12 hours. 
Product of both types of preparation showed binding 
activity, as indicated in Figure 7A. 

Demonstration of Bifunctionality : 

This protein with an FB leader and a fused 
BABS is bifunctional; the BABS can bind the antigen 
and the FB can bind the Fc regions of 
immunoglobulins. To demonstrate this dual and 
simulataneous activity several radioimmunoassays were 
performed. 

Properties of the binding site were probed by 

a modification of an assay developed by Mudgett-Hunter 

et al. (J. Immunol. (1982) 12£: 1165-1172; Molec. 

Immunol. (1985) 22:477-488), so that it could be run 

on microtiter plates as a solid phase sandwich assay. 

Binding data were collected using goat anti-murine Fab 

antisera (gAmFab) as the primary antibody that 

initially coats the wells of the plate. These are 

polyclonal antisera which recognize epitopes that 

appear to reside mostly on framework regions. The 

samples of interest are next added to the coated wells 

and incubated with the gAmFab, which binds species 

that exhibit appropriate antigenic sites. After 

washing away unbound protein, the wells are exposed to 
125 

I-labelled (radioiodinated) digoxin conjugates, 
either as 125 I-dig-BSA or 125 I-dig-lysine. 
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The data are plotted in Figure 7A, which 
shows the results of a dilution curve experiment in 
which the parent 26-10 antibody was included as a 
control* The sites were probed with 125 I-dig-BSA as 
described above, with a series of dilutions prepared * 
from initial stock solutions, including both the 
slowly refolded (1) and fast diluted/quickly refolded * 
(2) single chain proteins. The parallelism between 
all three dilution curves indicates that gAraFab 
binding regions on the .BABS molecule are essentially 
the same as on the Fv of authentic 26-10 antibody, 
i.e., the surface epitopes appear to be the same for 
both proteins. 

The sensitivity of these assays is such that 

binding affinity of the Fv for digoxin must be at 

least 10 6 . Experimental data on digoxin binding 

yielded binding constants in the range of 10 8 to 
9-1 

10 M . The parent 26-10 antibody has an 
affinity of 5.4 X 10 9 M* 1 . Inhibition assays also 
indicate the binding of 125 I-dig-lysine, and can be 
inhibited by unlabelled digoxin, digoxigenin, 
digitoxin, digitoxigenin, gitoxin, acetyl 
strophanthidin, and ouabain in a way largely parallel 
to the parent 26-10 Fab. This indicates that the 
specificity of the biosynthetic protein is 
substantially identical to the original monoclonal. 

In a second type of assay, Digoxin-BSA is 
used to coat microtiter plates. Renatured BABS * 
(FB-BABS) is added to the coated plates so that only 
molecules that have a competent binding site can stick * 
to the plate. 125 I-labelled rabbit IgG 
(radioligand) is mixed with bound FB-BABS on the - 
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plates. Bound radioactivity reflects the interation 
of IgG with the FB domain of the BABS, and the 
specificity of this binding is demonstrated by its 
inhibition with increasing amounts of FB, Protein A, 
rabbit IgG, IgG2a, and IgGl, as shown in Figure 7B. 

The following species were tested in order to 
demonstrate authentic binding: unlabelled rabbit IgG 
and IgG2a monoclonal antibody (which binds 
competiviely to the FB domain of the BABS) ; and 
protein A and FB (which bind competively to the 
radioligand) . As shown in Figure 7B, these species 
are found to completely inhibit radioligand binding, 
as expected. A monoclonal antibody of the IgGl 
subclass binds poorly to the FB, as expected, 
inhibiting only about 34% of the radioligand from 
binding. These data indicate that the BABS domain and 
the FB domain have independent activity. 

IV. OTHER CONSTRUCTS 

Other BABS-containing protein constructed 
according to the invention expressible in E. coli and 
other host cells as described above are set forth in 
the drawing. These proteins may be bifunctional or 
multifunctional. Each construct includes a single 
chain BABS linked via a spacer sequence to an effector 
molecule comprising amino acids encoding a 
biologically active effector protein such as an 
enzyme, receptor, toxin, or growth factor. Some 
examples of such constructs shown in the drawing 
include proteins comprising epidermal growth factor 
(EGF) (Figure 15A) , streptavidin (Figure 15B) , tumor 
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necrosis factor (TNF) (Figure 15C) , calmodulin (Figure 
15D) the beta chain of platelet derived growth factor 
(B-PDGF) (15E) ricin A (15F), interleukin 2 (15G) and 
FB diraer (15H). Each is used as a trailer and is 
connected to a preselected BABS via a spacer 
(Gly-Ser-GIy) encoded by DMA defining a BamHI 
restriction site. Additional amino acids may be added 
to the spacer for empirical refinement of the 
construct if necessary by opening up the Bam HI site 
and inserting an oligonucleotide of a desired length 
having BamHI sticky ends* Each gene also terminates 
with a PstI site to facilitate insertion into a 
suitable expression vector* 

The BABS of the EGF and PDGF constructs may 
be, for example, specific for fibrin so that the EGF 
or PDGF is delivered to the site of a wound. The BABS 
for TNF and ricin A may be specific to a tumor 
antigen, e.g., CEA, to produce a construct useful in 
cancer therapy • The calmodulin construct binds 
radioactive ions and other metal ions. Its BABS may 
be specific, for example, to fibrin or a tumor 
antigen, so that it can be used as an imaging agent to 
locate a thrombus or tumor* The streptavadin 
construct binds with biotin with very high affinity. 
The biotin may be labeled with a remotely detectable 
ion for imaging purposes* Alternatively, the biotin 
may be immobilized on an affinity matrix or solid 
support. The BABS-streptavidin protein could then be 
bound to the matrix or support for affinity 
chromatography or solid phase immunoassay. The 
interleukin-2 construct could be linked, for example, 
to a BABS specific for a T-cell surface antigen. -The 
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FB-FB dimer binds to Fc, and could be used with a BABS 
in an immunoassay or affinity purification procedure 
linked to a solid phase through immobilized 
immunoglobulin. 

Figure 14 exemplifies a multifunctional 
protein having an effector segment as a leader. It 
comprises an FB-FB dimer linked through its C-terminal 
via an Asp-Pro dipeptide to a BABS of choice. It 
functions in a way very similar to the construct of 
Fig. 15H. The dimer binds avidly to the Fc portion of 
immunoglobulin. This type of construct can 
accordingly also be used in affinity chromatography, 
solid phase immunoassay, and in therapeutic contexts 
where coupling of immunoglobulins to another epitope 
is desired. 

In view of the foregoing, it should be 
apparent that the invention is unlimited with respect 
to the specific types of BABS and effector proteins to 
be linked. Accordingly, other embodiments are within 
the following claims* 

What is claimed is: 
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Claims 

A single chain multi-functional biosynthetic 
protein expressed from a single gene derived by 
recombinant DNA techniques , said protein comprising: 

a biosynthetic antibody binding site capable 
of binding to a preselected antigenic determinant and 
comprising at least one protein domain, the amino 
acid sequence of said domain being homologous to at 
least a portion of the sequence of a variable region 
of an immunoglobulin molecule capable of binding said 
preselected antigenic determinant; and, peptide 
bonded thereto, 

a polypeptide selected from the group 
consisting of effector proteins having a conformation 
suitable for biological activity in mammals, amino 
acid sequences capable of sequestering an ion, and 
amino acid sequences capable of selective binding to 
a solid support* 

2o The protein of claim 1 wherein said binding 

site comprises at least two domains connected by 
peptide bonds to a polypeptide linker* 

3* The protein of claim 2 wherein said two 

domains mimic a V H and a V L from a natural 
immunoglobulin. 
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4. The protein of claim 2 wherein the amino 
acid sequence of each of said domains comprises a set 
of CDRs interposed between a set of FRs, each of 
which is respectively homologous with at least a 
portion of CDRs and FRs from a said variable region 
of an immunoglobulin molecule capable of binding said 
preselected antigenic determinant. 

5. The protein of claim 4 wherein at least one 
of said domains comprises a said set of CDRs 
homologous to a portion of the CDRs in a first 
immunoglobulin and a set of FRs homologous to a 
portion of the FRs in a second/ distinct 
immunoglobulin. 

6. The protein of claim 2 wherein said 
polypeptide linker spans a distance of at least 40 
angstroms is hydrophilic. 

7. The protein of claim 2 wherein said 
polypeptide linker comprises amino acids which 
together assume an unstructured polypeptide 
configuration in aqueous solution. 

8. The protein of claim 2 wherein said 
polypeptide linker is cysteine-f ree. 

9. The protein of claim 2 wherein said 
polypeptide linker comprises a plurality of glycine 
or alanine residues. 
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10. The protein of claim 2 wherein said 
polypeptide linker comprises plural consecutive 
copies of an amino acid sequence, 

11. The protein of claim 2 wherein said 
polypeptide linker comprises one or a pair of amino 
acid sequences recognizable by a site specific 
cleavage agent. 

12 • The protein of claim 4 wherein said antibody 

binding site binds with said antigenic determinant 
with a specificity at least substantially identical 
to the binding specificity of said immunoglobulin 
molecule. 

13. The protein of claim 4 wherein said antibody 

binding site binds said antigenic determinant with an 
affinity of at least about 10 6 M" 1 . 

14 o The protein of claim 4 wherein said antibody 

binding site binds said antigenic determinant with an 
affinity no less than about two orders of magnitude 
less than the binding affinity of said immunoglobulin 
molecule. 

15. The protein of claim 1 further comprising a 

polypeptide spacer incorporated therein interposed 
between said antibody binding site and said 
polypeptide. 



16 o The protein of claim 15 wherein said 

polypeptide spacer comprises amino acids selectively 
susceptible to cleavage. 
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17. The protein of claim 15 wherein said spacer 
is hydrophilic. 

18. The protein of claim 15 wherein said spacer 
comprises amino acids which together assume an 
unstructured polypeptide configuration in aqueous 
solution. 

19. The protein of claim 15 wherein said spacer 
is cysteine-f ree. 

20. The protein of claim 15 wherein said spacer 
comprises a plurality of glycine or alanine residues. 

21. The protein of claim 15 wherein said spacer 
comprises plural consecutive copies of an amino acid 
sequence. 

22. The protein of claim 1 wherein said effector 
protein is an enzyme, toxin, receptor, binding site, 
biosynthetic antibody binding site/ growth -factor, 
cell-differentiation factor, lymphokine, cytokine, 
hormone, or anti-metabolite* 

23 . The protein of claim 1 wherein said sequence 
capable of sequestering an ion is calmodulin, 
metallothionein, a fragment thereof, or an amino acid 
sequence rich in at least one of glutamic acid, 
aspartic acid/ lysine, and arginine. 
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24. The protein of claim 1 wherein said 
polypeptide sequence capable of selective binding to 
a solid support is a positively or negatively charged 
amino acid sequence , a cysteine-containing amino acid 
sequence, streptavidin, or a fragment of protein A. * 

25. The protein of claim 1 comprising a 9 
plurality of biosynthetic antibody binding sites. 

26. The protein of claim 1 comprising an 
additional biofunctional domain. 

27. a DNA encoding the protein of claim 1. 

28. A host cell harboring and capable of 
expressing the DNA of claim 27. 

29. A biosynthetic binding protein expressed 
from DNA derived by recombinant techniques 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected by a polypeptide linker, -the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between a -set of 
FRs, each of which is respectively homologous with at 
least a portion of CDRs and FRs from an- 
immunoglobulin molecule, 

at least one of said domains comprising a * 
said set of CDR amino acid sequences homologous to a 
portion of the CDR amino acid sequences of a first * 
immunoglobulin molecule, and a set of FR amino acid 
sequences homologous to a portion of the FR sequences 
of a second, distinct immunoglobulin molecule. 
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said polypeptide domains together defining a 
hybrid synthetic binding site having specificity for 
a preselected antigen. 

30. The binding protein of claim 29 wherein said 
domains comprise FRs homologous to a portion of the 
FRs of a human immunoglobulin. 

31. The binding protein of claim 29 wherein said 
polypeptide domains are peptide bonded to a 
biologically active amino acid sequence. 

32. The binding protein of claim 29 further 
comprising a radioactive atom bound to said binding 
protein. 

33. A DNA encoding the binding protein of claim 
32. 

34. A host cell harboring and capable of 
expressing the DNA of claim 33. 

35. A biosynthetic binding protein expressed 
from DNA derived by recombinant techniques 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected by a polypeptide linker , the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between a set of 
FRs, each of which is respectively homologous with at 
least a portion of CDRs and FRs from an 
immunoglobulin molecule. 
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said polypeptide linker comprising plural, 
peptide-bonded amino acids defining a polypeptide of 
a length sufficient to span the distance between the 
C-terminal end of one of said domains and the 
N-terminal end of the other of said domains when said 
binding protein assumes a conformation suitable for 
binding, and comprising hydrophilic amino acids which 
together assume an unstructured polypeptide 
configuration in aqueous solution, 

said binding protein -being capable of 
binding to a preselected antigenic site, determined 
by the collective tertiary structure of said sets of 
CDRs held in proper conformation by said sets of FRs 
and said linker when disposed in aqueous solution, 

36c The binding protein of claim 35 wherein said 

polypeptide linker spans a distance of at least about 
40A when said binding protein is disposed in aqueous 
solution in a conformation suitable for binding said 
preselected antigen. 

37. The binding protein of claim 35 wherein said 
polypeptide linker comprises a plurality of glycine 
or alanine residues » 

38. The binding protein of claim 35 wherein said 
linker comprises plural consecutive copies of an 
amino acid sequence. 



39. The binding protein of claim 35 wherein said 

linker comprises (Gly-Gly-Gly-Gly-Ser)- . 
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40. The binding protein of claim 35 wh rein at 
least one of said domains comprises a said set of 
CDRs homologous to a portion of the CDRs in a first 
immunoglobulin and a set of FRs homologous to a 
portion of the FRs of a second, distinct/ human 
immunoglobulin. 

41. The binding protein of claim 35 wherein at 
least one of said polypeptide domains is peptide 
bonded to a biologically active amino acid sequence. 

42. The binding protein of claim 35 further 
comprising a radioactive atom bound to said 
polypeptide chain. 

43. A biosynthetic binding protein expressed 
from DNA derived by recombinant techniques, 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected -by a polypeptide linker, the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between: a set of 
FRs, each of which are respectively homologous with 
at least a portion of CDRs and FRs from an 
immunoglobulin molecule, 

said binding protein being capable of 
binding to a preselected antigenic determinant, 
determined by the collective tertiary structure of 
said sets of CDRs held in proper conformation by said 
sets of FRs when disposed in aqueous solution, with a 
binding specificity at least substantially identical 
to the binding specificity of said immunoglobulin - 
molecule comprising said homologous CDRs. 
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44. A biosynthetic binding protein expressed 

from DNA derived by recombinant techniques, 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected by a polypeptide linker, the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between a set of 
FRs, each of which are respectively homologous with 
at least a portion of CDRs and FRs from an 
immunoglobulin molecule, 

said binding protein being capable of 
binding to a preselected antigenic determinant, 
determined by the collective tertiary structure of 
said sets of CDRs held in proper information by said 
sets of FRs when disposed in aqueous solution, with a 
binding affinity at least 10 6 M" 1 . 

45c The binding protein of claim 43 or 44 having 

a binding affinity at least about 10 8 M" 1 . 

46. The binding protein of claim 43 or 44 having 
a binding affinity no less than two orders of 
magnitude less than the binding affinity of said 
immunoglobulin molecule comprising said homologous 
CDRs. 

47. The binding protein of claim 43 or 44 
wherein at least one of said polypeptide domains is 
peptide bonded to a biologically active amino acid 
sequence. 
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48. The binding protein of claim 43 or 44 

further comprising a radioactive atom bound to said 
polypeptide chain. 
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CTTCCCCCTCTTCCCAGTCTCTCCTCCATTCTAATCCTAACACTTACCTCAACTOGTACCTCCAAAAOGC 
erCy3ArgSerSerGlnSepL euyalHlaSer»3nClyA3nTl»pTvr L e uA aH Tr n Tv^t„„ri 

£w Maelll HglEII Baal 

Mboll BatXI 
Sau96l 




150 1*0 170 180 190 200 210 

T S? T -? CTCTCCOAAOCTTCTGATCTACAAilOTCTCTA * cc CCTTCTCTGOTGTCCCOGATCGTTTCTCT 
«GlyClnSerProl.yaLeuLemieTyr LyayalSerAanArgPheSap ClvValPpaAaBAt. g Ph«fi^ 
Alul Sau3A Hpall 
HlndXI1 NelISau3A 

SerFI 

220 230 280 250 260 - 270 280 

S? T I CT 5? TTCTCCTACTGACTTCACCCTCAAGATCTCTCCT « C C*00CCGACCATCTGCGTATCTACT 
GlySerGlySerGlyThrA3 P PheThrLeuLyaIleSerArgValGluAlaGluAapI.euGlyIleTyrP 
Baal HphI Bglll TaqlHaelll Sau3A 

MboXI XhoII 
Sau3A 
XhoII 

290 300 310 320 330 31,0 350 

TCTCCTCTCAGACTACTCATCTACCCCCGACCTTCGGCGGTGGCACCAAGCTCGAGATCAAACGTTGAGGATCC 
heCyaS ^ CanTh ^hrHlayalPr 0 Pr 0 ThrPh»civr.wr.i tfT K., r -.I!:^ *™ AAACC ^Ig AGGATCC 

£2£1 Mla "* M«XX Bani Alul S.U3A Maell BamHI 

BaaI H1*IV Aval HlalV 

T»QI 5au3A 

Xhol XhoZZ 

FlGn. H6 



WO 88/09344 
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10 20 30 40 SO 60 70 

CAATTCCJIACTTCAACTCCACCAGTCTCCTC&TCAATTCCTTAAACCTCCCCCCTCTCTCCCCATCTCCT 
GluPheCluValClnLeuGlnClnSerClyProCluLeuValLyaProClyAlaSerValArgMetScrC 

AauII Bbvl Avail Ahall Hhal 

ECORI Fnu4HI Sau96I BanI HinPi 

Taql PstI EcoRII MatlNlalll 

Haall Papi 
Hhal 
HinPI 
Narl 
NlaZV 
Acyl 

80 90 too no 120 130 uo 

GCAAATCCTCTGGC TACATTTTCACCAATTACTACATCCATTGCGTTCGCCAGTCTCA TGGTAACTCTCT 
CATGTAAA AGTGGTTAATGATGTAGGTAACCC A AGCGCTC 

ysLysSerSerGlyTyrllePheThrAanryrTyrlleHisTrpValArgGlnSerHisGlyLyaSerLe 
Raal Hpfal Fokl BatXI Nlalll Xba 

Ha 

*50 160 170 180 190 200 210 

AGACTAC ATCGGGTGGATCTACCCCGGTAATGCTAACACTAAGTACT-ACAATGACAACTTT AAAGCTAAG 

TGATGTCTCCCACCTAGATGGGCCCATTACCATTGTCATTCATGATGTTACTCTTGAAA 
uAapTyrlleGiyTrpXleTyrProGiyAanGlyAanThrLyaTyrTyrAanGluAanPhcLyaGlyLya 
I Sau3A Aval HaelllDdelRaal Oral 

XhoII Hpall seal 

Nell 

Hell 
Seal 
Xaal . 

220 230 2*0 250 260 270 280 

GCGACCCTTACTGTCGACAAATCTTCCTCAACTGCTTACATCGAGCTGCGTTCTTTGACCTCTGAGGACT 
AlaThrLeuThrVaiAapLyaSerSerSerThrAIaTyrMetGluLeuArgSerteuThrSerGluAapS 
Accl Mboll Alui Ddel Hinf 

Hindi MlalllBbvI 
Sail " FnulHI 

Taql 

290 300 310 320 330 3*0 350 

CCGCGCTATACTATTCCGCGGGCTCCTCTGGTAACAAAT GGGCCTTCGATTACTGGGGTCATGG CGCCTC 
- » - ^ GCAAGCTAATGACCCCACTACCGC 

erAIaValTyrTyrCyaAlaGlyScrSerClyAanLyaTrpAiaPheAapTyrTrpGlyHiaGIyAIaSe 
I Accl HhalBanll Maelll Haelll Ahall 

FnuDII FnuDII Sau96ITaaI 
Saell HinPINlalV 

360 370 
TGTTACTCTATCCTCATAGGATCC 
rvalThrValSerSer«am 
Maelll BaoHl 
Mlaiv 

Sau3A Fl&l. HO 

Xholl 
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10 " 20 30 *0 50 60 70 

CAATTCCACCTCCTAATCACCCACACTCCCCTCTCTCTCCCCCTTTCTCTCCCTCACCACCCTTCTATTT 

CluPheA3pV«lValMetThrClnTlJpProLouSerLeuPpoV«lSerLeuClyAapClnAl*S«rIl«s 
EcoRI Aatll Hinfl Hpall BatEII 

^ AtoaI1 HphI EcoBII 

Taql ScpFI 

Ae Y z HatXli 
Maell 

80 90 100 110 120 130 140 

CTTCCCCC TCTTCCCACTCTATTCTCCACTCTAATGCTAACACTTACCTCGATTGCTAC CTGCAAAACGC 

AACGCCGAGAAGCCTCAGATAACACGTCAGATTACCATTCTGAATOGACCTAAC 

eruyaArgsepSerClnSerlleVaiHiaSerAanGiyAanThrTyrLeuAapTrpTyrLeuGlnLyaAl 
F™4HI HgiAI Maelll EcoBII BanI 

MboIX ScrFI Kpnl 

HglEII MlalV 
Baal 

'50 160 170 180 190 200 210 

TGGTCAGTCTCCGAAGCTTCTGATCTACAAAGTCTCTAACCGCTTCTCTGGTCTCCCCGATCGTTTCTCT 
aGlyGlnSerppoLyaLeuLeuIXeTyrLyaValSerAsnArgPheSerGlyValProAspAPgPheSep 
Alul Sau3A Hpall 
Hindlll HciISau3A 

ScrFI 

220 230 240 250 260 270 280 

CGTTCTCGTTCTCGTACTGACTTCACCCTGAAGATCTCTCGTCTCGACC CCGAGGATCTGGGTATCTACT 

ugctcctaGaCCCataAATSA 

GlySerGlySerGXyThrAapPheThrLeuLyalleSerArgValGIuAlaGluAapUeuGlylleTyrT 
Baal HphI Bglll Taql Haelll Sau3A 

MbolX Xnoll 
Sau3A 
XhoII 

290 300 310 320 330 3*0 350 

TGACGAAGGTCCCCAGAGTACATGGCACCTGGAAGCCGCCACCCTGCTTCCAGCT 
yrCyaPheGInGlySerHiaValProTrpThrPheGlyGlyClyThrtyaLeuGluIleLy3Arg«op 

EcoBII BanI Alul Sau3A Maell BaaHI 

ScpFI Raal Sau96I HlalV Aval MlalV 

HglEII Taqi Sau3A 

Xhol XhoII 
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* ApalHpall Baal BdelHinfl 



Baatt Ha«ZI Tthiui 

Ha.III " 

Hell 
HI* I? 
Sau96Z 
S«u96I 

ScrFI 



AO 90 ioo no 19n 

h C II5IJ^;i?r C 5f A I C " CCTTCTCTAACT * CTAC *"CATTOCC??CC- 

hrCyaThrValSerGlySerThrPheSerAa„T I rT £ r I leH^TrpV al Ar, 
bmhi y^a AvaIIH j 



^30 140 
CTCAACCCCCCCCTCCTOO 
„.„„, r . • gClnProProClyArgCl 

u B , t 7 FokI Avall Hlnell Hpall 

P ii; iv Moll 
XhoII 



:gactcSatccgttmJtttacccwStaatccta\ 8 ?actaa^^ 

?LyaCly 
N 
Sp 

Nell 
ScrFI 
. ScrFI 
Sail 
Xfflal 



*H£i Nell 



hi Hlnell 881 ol? 6 '* 1 M »eI"Fnu«HI 

s , lr BovII FnuDII 

"Ta^I s ««» 

"Zx ■M«} r> £« I Mlwfr ' A1 ' ,h '''r T r- T -r "r' , 'nru-. ■ i tutu 

Fnuon P Banll BatEII 

FauDII „ E ""I HphI 

Hhat Haelll Maelll 

Hhal Sau96I 

HlnPI Ser" 

360 370 
CGTATCCTCTTAACTGCAG 
rValSerSer'ocLeuGln 
Pat I 
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10 20 30 «0 §o 60 70 

C1ATTCATCGAATCTCTTCTCACTCJICCCCCCCTCTCTATCTCGTGCACCCCCTCAACGCCTAACT1TCT 

GluPheMetGlaSerValLeuThrGlBProProSerValSerGlyAlaProGlyGlnArgV.lThrlltS 
EcoHI Hlnfl OdelFnulHI HgiAtHpall FnuDII 

H1 * 111 Hlnfl NelXHlnelX Naelll 

Xb »I ScrFX Mlul 

80 90 10Q 110 120 130 i«o 

CTTGCCGTTCCTCTCAGTCTATTGTCCATTCTAATGGCAACACTTATCTGGAATGGTACCAACAACTGCC 
erCyaArgSerSerGln SerlleValHlaSgrAanGlvAanThrTyrLeuGlu TrpTyrClnGlnL^uP^ 
Ddel flatXI Ban I H p 

Kpnl Me 

NlalV . Se 
Raal 

'50 160 170 180 190 200 210 

GGGCACCGCGCCGAAGCTGCTGATCTTTAAAGTATCTAATCGCTTCTCTGGCGTACCGGATCGATTCTCT 
QGlyThrAlaProLyaLeul.euIlePfae Ly3ValSerA3nArgPheSerGlv tf a iPr«A i »nfl^,PK.<.- 
all FnuOII Alul Oral L 8 »al Clal 

*It ufL ! bvI Sau3A HpaTT Hlnfl 

rFI HinPI Fnu«HI s , u3A 

BanI TaqI 

220 230 2«0 250 260 270 280 

GTATCTAAGTCTGCCTCCTCTGCCACTCTGGCGATCACTGGTCTGCAAGCAGAAGATGAGCCCGATTACT 
ValSerLyaSerGlySerSerAlaThrLeuAlalleThrGlyLeuGlnAlaGluAapGluAlaAapTyrT 
Ddel Ml. IV Bgll Sau3A HboII Haelll 

290 300 310 320 330 340 350 

ACTGTTTTCAAGCCTCTCATGTACCCTGGACCTTCCGTGGTGGCACCAAGCTTACTGTACTCCGTCAGCC 
yrCyaPheGlnGlySerHlaValProTrpThrPheGl y Glvr.WTh^i r .i...T»..^.i,>...- |I ^ 1n p r 

Hlalll Avail BanI Alul Baa I Hgal 

Baal Sau96I MlalV Hind III 

HgiEII — — 

360 

GTAACTGCAG - ,, r- 

o'oeLeuGln F I G». 

Pat I 
Haelll 
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F,M HaeZZ tUpHX 

Hh»X 

. HinPX 

NXaXV 
SerFX 



X., | 
| 85 95 105. 115 



125 135 145 



GGGTACCCCCAGTCTC^TGGTAAGTCTCTAGACTTTAAACCrrAAGGCGACCCTTACTCTCGACA^TCTrCCTCA 
CYRQSHCKSLDFKCKATLT V D K S 5 S 
B»I ^ Nl.UI tt*J ^ 

HplV 
RsaX 

I X 3 I 

icq 170 180 190 200 |210 | 220 

ACTGCTTACATGGACCTCCGTTCTTT^CCTCT^^ 

Nl.IXXBbvI- MnlX^MnU- AecXX T8ql S 

Fttu4KI NspBII BaaflU 

SacII Hhal 

Hhal 
HlnPZ 
Hin?I 



*R»4 

235 245 255 265 

CCCCATCCCCCTAGCCTTACCOTCACCTCCTAACCATCC /7«. * 

CHCASVTVSS^CS 
§ «XV H..II AluX DdelBanHI 

* iu96X Hh»X B»nXXM$tXINilXV 
HaeXXX HlnPX BSP1286 SiU3A 

NCOI NheX HQiAX XhoXI 

* NllXXX S»Cl 

Sty! 
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10 20 30 40 50 60 70 

GAATTCATGGCTGACAACAAATTCAACAAGGAACAGCAGAACGCGTTCTACGAGATCTO 

EFMADNKFNKEQQNAFYEILHLFNL 
EcoRI Mlul Bglll BspMI+ 

XmnZ 

85 95 105 115 125 135 145 

AACGAAGAGCAGCGTAACGGCTTCATCCAAAGCTTGAAAGACGACCCGTCT 
NEEQRNGFIQSLKDDPSQSANLLAE 

Hindlll BspMI+ 

EC047III 

160 170 180 190 200 210 220 

GCCAAGAAACTGAACGACGCTCAGGCGCCGAAGAGTGATCCCGAAGTTCAAC^ 
AKKLNDAQAPXSDPEVQLQQSGPEL 

Narl Pstl 

235 245 255 265 275 285 295 

GTTAAACCTGGCGCCTCTGTGCGCATGTCCTGCAAATCCTCTGGGTACATTTTCACCGACTTCTACATGAATTGG 
VKPGA.SVRMS CK5S GY I FTD F Y M N W 

Narl Fspl 

310 320 330 340 350 360 370 

GTTCGCCAGTCTCATGGTAAGTCTCTAGACTACATCGGGTACATTTCCCCATACTCTGGGGTTACCGGCTACAAC 
VRQSHGKSLDYIGYISPYSGVTGYN 
BStXI Xbal PflMI BstEII 

385 395 405 415 425 435 445 

CAGAAGTTTAAAGGTAAGGCGACCCTTACTGTCGACAAATCTTC 
QKFKGKATLTVDKSSSTAYMEL-RSL 
Dral Sail 

460 470 480 490 500 510 520 

ACCTCTGAGGACTCCGCGGTATACTATTGCGCGGGCTCCTCTGGTAACAAATGGGCCATGGATTATTGGGGTCAT 
TSEDSAVYYC-AGSSGNKWAMDYWGH 
SacII Ncol 

535 545 555 565 575 585 595 

GGTGCTAGCGTTACTGTGAGCTCTGGTGGCGGTGGGTCGGGCGGTGGTGGCTCGGGTGGCGGCGGATCCGACGTC 
GASVTVSSGGGGSGGGG5GGGGSDV 
Hhel SacI BamHI Aatll 

610 620 630 640 650 660 670 

GTTGTTACCCAGACTCCGCTGTCTCTGCCGGTTTCTCTGGGTGACCAGGCTTCTATTTCTTGCCGCTCTTCCCAG 
VVTQTPLSLPVSLGDQASISCRSSQ 

BStEIl PflM 

685 695 705 715 725 735 745 

TCTCTGGTCCATTCTAATGGTAACACTTACCTGAACTGGTACCTGCAAAAGGCTGGTCACT 

SLVHSNGNTYLNWYLQKAGQSPKLL 
I BstXI BspMI+ Hindlll 

Rpnl 



FIG. *A-1 
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760 770 780 790 800 810 820 

ATCTACAAAGTCTCTAACCGCTTCTCTGGTGTCCCGGATCGTTTCTCTGGTTCTGGTT 
IYKVSHRFSGVPDRPSCSG_SGTDFT 

835 845 855 865 875 885 895 

CTGAAGATCTCTCGTGTCGAGGCCGAAGACCTGGGTATCTACTTCTGCTCTCAGACTACTCATGTACCGCCGACT 
LKISRVEAEDLGIYFCSQTTHVPPT 
Bglll 

910 920 930 940 

TTTGGTGGTGGCACCAAGCTCGAGATTAAACGTTAACTGCAG 
FGGGTKLE I K R * 

Xhol -Hpal PstI 
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10 20 30 40 SO 60 

GATCCTGACGTCGTAATGACCCAGACTCCGCTGTCTCTGCCGGTTTCTCTGGGTGACCAG 
0 P D VVflTOTPLSLPVSL G D O 
Aatll BmtEIX 

70 BO 90 IOO 11 120 

BCTTCTAT7TCTTBCC6CTCTTCCCASTCTCTBSTCCATTCTAATBSTAACACTTACCTB 
ASZ SCRSSQSLVHS -NBNTYL 

PflMI BatXX 

130 140 150 160 170 1B0 

AACTGBTACCTSCAAAABGCTGGTCAGTCTCC6AABCTTCTBATCTACAAAGTCTCTAAC 
NWYLQICAGQSPICLLIYKVSN 

B*pM* Hindi I I 

KpnX : 

190 200 210 220 230 240 

CSCTTCTCTBGTBTCCCBGATCBTTTCTCTBBTTCTBSTTCTGBTACTBACTTCACCCTB 
RFSBVPDRFS6B6SGT0FTL 

250 260 270 2BO 290 300 

AABATCTCTCGTBTCBAGGCCGAABACCTBBBTATCTACTTCTBCTCTCABACTACTCAT 
KISRVEAEDLBI YFCSQTTH 

BglXX 

■»1C 320 330 340 350 360 

GTACCGCCGACTTTTGGTGGTGGCACCAAGCTCGAGATTAAACGTGGATCTGGAGGTG5C 
VPPTFGGGTKLEX K R 6 S 6 B 6 

Xhol 

370 380 390 400 410 420 

BGATCTGGTGGAGGTGGCTCTGGT6GCG6TGGATCCBAAGTTCAATTGCAGCAGTCTGGT 
GSBGBBSBB6BSEV0LQQSB 

BamHX 

430 440 450 460 470 4B0 

CCTGAATTGGTTAAACCTGGCGCCTCTGTGCGCATGTCCTGCAAATCCTCTGGGTACATT 
PELVKPGASVRMSCKSSGYI 
Narl FspX 

490 500 510 520 530 540 

TTCACCGACTTCTACATGAATTGGGTTC6CCAGTCTCATGGTAAGTCTCTAGACTACATC 
FTDFYMNWVRQSHGKSLDYI 

BstXI Xbal 

550 560 570 ' 5B0 590 600 

GG6TACATTTCCCCATACTCTBGBGTTACCBGCTACAACCAGAAGTTTAAAGGTAAGGCG 
GY1SPYSGVTGYNQKFKGKA 
Pfllll BstEXI . Oral 

610 620 630 640 650 660 

ACCCTTACTGTCGACAAATCTTCCTCAACTGCTTACATGGAGCTGCGTTCTTTGACCTCT 
TUTVDKSSST AYMELRSLTS 

Sail 

670 680 690 700 710 720 

6ASGACTCCGCGGTATACTATTGCGCGGGCTCCTCTGGTAACAAATGGGCCATGGATTAT 
EDSAVYYCAGSSGNKWAWDY 

SacIX N »I 

730 740 750 760 (oS 

TGGGGTCATGGTGCTAGCGTTACTGTGAGCTCTTAACTBCAG ■ 1 
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10 20 30 40 50 60 

GAAGTTCAACTGGAGCAGTCTGGTCCTGGATTGGTTCGACCTTCCCAGACTCTGTCCCTG 
EVQLEQSGPGLVRPSQTLSL 

70 80 90 100 110 120 

ACCTGCACATCCTCTGGGTACATTTTCACCGACTTCTACATGAATTGGGTTCGCCAGCCT 
TCTSSGVIPTDFYMNHV RQP 

BspMI+ BstX1 

130 140 150 160 170 180 

CCTGGTCGGGGTCTAGACTACATCGGGTACATTTCCCCATACTCTGGGGTTACCGGCTAC 

PGRGLDYIGYISPYSG V T G Y 
Xbal PflHI BstEIX 

190 200 210 220 230 240 

AACCAGAAGTTTAAAGGTAAGGCGACCCTTCTGGTCAACAAATCTAAGAACCAGGCTTCC 
NQKFKGKATLLVNKSKNQAS 

Dral 

250 260 270 280 290 300 

CTGCGGCTGTCTTCTGTGACCGCTGCGGACACCGCGGTATACTATTGCGCGGGCTCCTCT 
LRLSSVTAADTAVYYCAGSS 

sacll 

310 320 330 340 350 360 

GGTAACAAATGGGCCATGGATTATTGGGGTCAGGGTTCTCTGGTTACTGTGAGCTCTGGT 
GNKWAMDYWGQGSLVTVS SG 

NCOl SacI 

370 380 390 400 410 420 

GGCGGTGGGTCGGGCGGTGGTGGCTCGGGTGGCGGCGGATCCGACGTCGTTATGACCCAG 
GGGSGGGGSGGG5SDVVMTQ 

BamHI Aatll 

430 440 450 460 470 480 

CCTCCGTCGGTTTCGGGGGCTCCTGGTCA6CGGGTTACTATTTCTTGCCGCTCTTCCCAG 
PPSVSGAPGQRVTISCR S S Q 

PflM 

490 500 510 520 530 540 

TCTCTGGTCCATTCTAATGGTAACACTTACCTGAACTGGTACCAGCAACTGCCTGGTACG 
SLVHSNGNTYLNWYQQLPGT 

I BstXI J*" 1 

550 560 570 580 590 600 

GCTCCGAAGCTTCTGATCTACAAAGTCTCTAACCGCTTCTCTGGTGTCCCGGATCGTTTC 
APKLLIYKVSNRFSGVPDRF 

Hindlll 

610 620 630 640 650 660 

TCTGGTTCTGGTTCTGGTACTGACTTCACCCTGGCGATCACTGGTCTCCAGGCCGAAGAC 
SGSGSGTDFTLAITGLQAED 

670 680 690 700 710 720 

GAGGCTGACTACTTCTGCTCTCAGACXACTCATGTACCGCCGACTTTTGGTGGTGGCACC 
E'ADYFCSQTTH VPPTFG GGT 

730 740 750 Q . 

AAGCTCACGGTTCTGCGTTAACTGCAG F 1 Ul . ' " 

KLTVLR* LQ 
Hpal PstI 
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10 20 30 40 50 60 

GAATTCGAAGTTCAACTGCAGCACTCTCGTCCTGAATTCCTTAAACCTGCCCCCTCTCTG 
EFEVQLQQSGPELVKPCASV 
Ajttll P»tl N«rl FS 

EcoRI 

70 80 90 100 110 120 

CGCATGTCCTCCAAATCCTCTGGGTACACCTTCACCAACTATTACATCCACTCCCTTAAG 

RMSCKSSGYTFTNYYIHWL K 
P 1 AflXI 

130 140 150 160 170 180 

CAGTCTCATGGTAAGTCTCTAGAGTGGATCGGTTGGATTTACCCGGGTAATGGTAACACT 
QSHGKSLEWIGWIYPGNGNT 

Xbal saal 

190 200 210 220 230 240 

AAGTACAATGAGAACTTTAAAGGTAAGGCGACCCTTACTGTCGACAAATCTTCCTCAACT 
KYNENFKGXATX.TVDKS5ST 

Oral sail 

250 260 270 280 290 300 

G CTTACATGGAG CTGCGTTCTTTGACCTCTGAGGACTCCGCGGTATACTATTGCGCGCCT 
A Y MELRSLTSEDSAVYYCAR 

SacII BssHII 

310 320 330 340 350 360 

TACACTCATTATTACTTCGATTATTGGGGCCATGGCGCTAGCGTTACCGTGAGCTCTGGT 
YTHYYFDYWGHGASVTVSSG 

Ncol Nhel Sad 

370 380 390 400 410 420 

GGCGGTGGCTCGGGCGGTGGTGGGTCGGGTGGCGGCGGATCCGACGTCGTTATGACCCAG 
GGGSGGGGSGGGGSDVVHTQ 

BamHI Aatll 

430 440 . 450 460 470 480 

ACTCCGCTGTCTCTGCCGGTTTCTCTGGGTGACCAGGCTTCTATTTCTTCCCGCTCTTCC 
TPLSLPVSLGDQASISCRSS 

BstEII 

490 500 510 520 530 540 

CAGTCTATCGTCCATTCTAATGGTAACACTTACCTGGAGTGGTACCTGCAAAAGGCTGGT 
QSIVHSNGNTYLEWYLQKAG 
BstXI BspMI+ 

Kpnl 

550 560 570 580 590 600 

CAGTCTCCGAAGCTTCTGATCTACAAAGTCTCTAACCGCTTCTCTCGTGTCCCGGATCGT 

QSPKLLIYKVSNRF5GVPDR 
Hindlll 

610 620 630 640 650 660 

TTCTCTGGTTCTUUT1 > CTCGTACTGACTTCACCCTGAAGA TC TCT C U'X , GTCGAGGCCGAG 
FSGSGSGTDFTLKISRVEAE 

BglZX 

670 680 690 700 710 720 

GATCTGGGTATCTACTACTGCraCCAAGGGTCTCATGTACCGTGGACTTTCGGCGGTGGG 
DLGIYYCFQGSHVPWTFGGG 

730 740 750 

ACCAAGCTCGAGATTAAACGTTAACTGCAG r i / S R 

TXLEIKR*LQ \\3i. 1 

Xhol Hpal PstI 
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10 20 30 40 50 60 

GATCCCGAGGTTATGCTGGTTGAATCTGGTGGAGTACTGATGGAACCTGGTGGGTCCCTG 
DPEVMLVESGGVLMEPGGSL 

Seal EcoO 

70 80 90 100 110 120 

AAGCTGAGCTGTGCTGCTAGCGGCTTCACGTTCTCTCGTTACGCCATGTCTTGGGTCCGT 
KLS CAAS GFTFSRYAMSWVR 
Espl Nhel Pf 1MI 

130 140 150 160 170 180 

CAGACTCCGGAGAAGCGTCTAGAGTGGG7CGCGACGATATCT7CTGGTGGTTCTCACACG 
QTPEKRLEWVATISSGGSHT 
BspMII Xbal Nrul EcoRV 

190 200 210 220 230 240 

TTCCATCCAGACAGTGTGAAGGGTCGATTCACGATCTCTCGAGACAACGCTAAGAACACG 
FHPDSVKGRFTISRDNAKNT 

Xhol 

250 260 270 280 290 300 

TTGTACCTG CAAATGTCTTCTCTACGTAGTGAAGATA CTG CTATGTACTACTGTG CACGT 
LYLQMS5LRSEDTAMYYCAR 
BspMI+ SnaBI ApaLI 

310 320 330 340 350 360 

CCTCCACTGATCTCACTAGTTGCTGATTATGCCATGGATTATTGGGGTCATGGTGCTAGC 
PPLISLVADYAMDYWGHGAS 
Spel Neol Nhel 

370 380 390 400 410 420 

GTTACTGTGAG CTCTGGTGGCGGTGGGTCGGGCGGTGGTGG CTCGGGTGGCGGCGGATCG 

VTVSSGGGGSGGGGSGGGGS 
Sad 

430 440 450 460 470 480 

GATATCGTTATGACTCAGTCTCATAAGTTCATGTCCACTTCTGTTGGTGACCGTGTTTCT 

DIVMTQSHKFMSTSVGDRVS 
EcoRV BStEII 

490 500 510 520 530 540 

ATCACTTGTAAGGCCAGCCAGGATGTGGGTGCTGCTATCGCATGGTATCAGCAGAAGCCC 
ITCKASQDVGAAIAWYQQKP 
PflMI Sma 

550 560 570 580 590 600 

GGGCAGTCTCCTAAGCTGCTGATCTACTGGGCGTCGACTCGTCATACTGGTGTCCCGGAT 

GQS P K I* L I YWASTRHTGVPD 
I Sail 

610 620 630 640 650 660 

* CGTTTCACTGGGTCCGGATCAGGTACTGATTTCACTCTGACTATTTCGAACGTTCAGTCT 
RFTGSGSGTDFTLTISNVQS 
BspMII ASUII 

670 680 690 700 710 720 

GATGACCTGGCTGATTACTTCTGCCAGCAATATTCCGGGTACCCTCTGACTTTCGGTGCC 
DDLADYFCQQYSGYPLTFGA 

Sspl Kpnl Nae 

730 740 750 r-. r Q K 

GGCACTAAACTCGAGCTGAAGTAACTGCAG r J \J{ . *D 

GTKLELK* 
I Xhol PstI 



WO 88/09344 PCT/US88/01737 

*0 20 30 An 

70 30 90 run 

««pi im.i rsav * f ^ « w v b 

BspMXX Hr^ Vi/ 8 ° 88 » « 



"OVKGRPTISRDMAKNT 



i?2 200 210 220 230 240 

CTCGJ 
S R 
Xhel 

250 260 270 ... 

B S pMI + 5 J^Jj S E DTAMYYCAR 

ApaLI 

ii? 320 330 340 350 



A £ 

Nhel 



* ° " Sco? ° 1 " S H 0 * s 



370 380 3go *nn 

sad ""Gsggggsggggs 

430 440 4SO . 

ECORV *».K FMS T S VG DRV S 

BstEII 

490 soo tig _,_ 

Pf 1MI Sma 
550 560 570 - QA 

Sail 

^ 610 620 fi3o 

BspMII F T L T ^J^/ v Q S 



670 680 



wwxsGYPLTFGA 



Sspi Kpni • - T F G N J e 



730 740 75n or- 

g ^^ctcgagctgaagtaactgSg Fl Gt. ^ E 
1 »»I PstI 
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